


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 








Volume 38 October, 1947 Number 6 








FACTORING FACTORS 


KARL J. HOLZINGER 
The University of Chicago 


A set of variables or tests which all have positive intercorrela- 
tions may be factored in terms of orthogonal factors such as 
appear in the centroid, principal, or bi-factor solutions. Having 
once obtained such an orthogonal solution, in which all factors 
are uncorrelated by assumption, one usually finds it possible to 
rotate to an oblique solution, in which the new oblique factors 
are intercorrelated. The primary factor analysts generally start 
with a centroid solution and rotate to oblique primary factors. 

Some of these analysts do not stop here, however. Noting 
that the primary or oblique ‘first-order’ factors are themselves 
correlated, they next factor ‘first-order’ factors to obtain ‘second- 
order’ factors. Sometimes the factoring stops at this stage, but 
recently several such analysts have had data which tempt them 
to factor the oblique ‘second-order’ factors to obtain ‘third- 
order’ factors. In one case at least an analyst has found it 
desirable to go on to the next stage and factor ‘third-order’ 
factors to obtain a solution in terms of ‘fourth-order’ factors. 
Such a process would have to stop when a single ‘nth-order’ 
factor was obtained. There seems to be a mounting enthusiasm 
as these analysts proceed to factors of higher and higher order, 
perhaps in the hope that, if they can factor far enough in the 
above manner, they will arrive at some sort of psychological 
ultimates. 

The present article is concerned with the statistical implica- 
tions of higher-order factoring because it is believed these should 
be clarified before attempts be made to attach ‘psychological 
meaning’ to higher-order factors. Inasmuch as these implica- 
tions have no bearing upon the actual nature of the original tests 

321 








322 The Journal of Educational Psychology 


employed, I shall employ an artificial example prepared for me 
by Dr. Swineford with the following statistical characteristics: 
(a) A hypothetical modified bi-factor pattern of thirteen tests 
and six orthogonal common factors was first set up so as to yield 
higher-order factors up to the third. An artificial centroid 
pattern could have been employed instead, but this would have 
involved much more statistical computation, and would have 
been no better for illustrative purposes. The unique factors 
have been omitted in Table II for simplicity. It will be noted 


TABLE ].—INTERCORRELATIONS 
Variable 12 3 4 5 6 7 8 9 #10 11 «12 «18 


Disceanne .73 

Be scces ee ae 

3.. . 66 .78 .72 

Bivexss .21 .35 .42 .98 
Piupives .06 .10 .12 .70 .68 


ines wae 12 .20 .24 .63 .48 .57 
7....... .09 .15 .18 .70 .62 .67 .83 
8 .21 .85 .42 .77 .46 .64 .69 .81 


9....... .24 .40 .48 .77 .40 .47 .45 .68 .89 

10....... .12 .20 .24 .63 .48 .41 .47 .48 .71 .77 

err 15 .25 .30 .35 .10 .20 .15 .35 .68 .62 .83 

Ee sspees 18 .80 .36 .42 .12 .24 .18 .42 .72 .60 .87 .97 
13....... .15 .25 .380 .85 .10 .20 .15 .35 .56 .44 .74 .89 .90 


that six overlapping common factors have been postulated. 
These are designated as G, A, B, C, D, and E. 

(b) The intercorrelations and communalities from this artificial 
pattern are presented in Table I. In order to illustrate how 
these were obtained we may select the first two rows of Table II, 
written as pattern equations as follows: 


42> .3G + 8A + a,U, +0 
22 => 5G + .8A +0+ aU, (1) 


where U; and Uz are the unique factors. The correlation riz 
is then given by multiplying the coefficients of G and A as 
follows: 


re = 3X 5+ .8 X 8 = .15 + .64 = .79 








Factoring Factors 323 


The communalities for the first two tests are 
h,;? = .3? + .8? = .73; he? = .5? + .8? = .89 


All other entries in Table I were obtained in a similar manner. 
If the correlations in the lower left corner of this table are 
repeated in the upper right corner we would have a square matrix 
whose rank is exactly six. This means that there are six linearly 
independent factors, as illustrated in Table II. 


TasBLE I].—HypotTuseticaL Mopiriep BI-FActor PATTERN 











Factor 
Variable 

G A B C D E 
Te ene 3 8 
eee 5 8 
eee 6 .6 
EER peer ooa te a 7 
ee 2 8 
ere 4 5 4 
ea, steeds 3 | 5 
i ee 7 4 4 
Tce aa 8 3 4 
a lal a 4 5 6 
ee re 5 BS 3 
Stet 6 6 5 
ae 15 4 me 























(c) Table II was so devised as to yield five ‘first-order’ 
factors, then three ‘second-order’ factors, and, finally, one ‘third- 
order’ factor. These sets of oblique factors are denoted by Greek 
letters with subscripts indicating the order of the factor. For 
example, a2, 82, and 72 are the three second-order factors. 








324 The Journal of Educational Psychology 


Before proceeding to the analysis of the oblique factors I 
should like to define certain terms which I use differently from 
the primary factor analysts. Table II is called a ‘factor pattern.’ 
It is a tabular arrangement of the coefficients of a set of thirteen 
equations, the first two of which are given by equations (1). 
The numerical coefficients in these equations indicate the ‘load- 
ings’ of the factors. The squares of these coefficients show 
the factorial composition of the tests. Thus from equations 
(1) the unit variance of Test 1 is made up of .09¢¢? + .6404? 


TABLE III.—OsLiQquE SOLUTION—First-ORDER 











S = Structure P = Pattern 
Variable 
ma) Bivm| a] a a By 71 1 €1 

eee! .836) .177| . 169) .205).173) .884|— .035|— .030|— .030)— .057 
a, .944| .296] .281) .342).288) .946; .000)— .005 .003|— .006 
rr .829)] .355] .337) .410) .346) .777) .043) .024 .033 .058 
er . 376) .978) .843) .798|.404; .058) .897) .028 .028 .050 
«Ss .107) .763) .626] .502).115}—.118} .917|—.050)/— .047;/— .110 
ee .215] .639] .755) .502).231}— .012} .001] .762);— .003)— .010 
Roses atae .161| .742) .879)| .524| .173)} — .092} .052) .920;/— .046/— .080 
Rt tage .376| .736| .859| .661/ .404) .102)}—.054) .814 .042 .096 
er .430) .715) .642) .912|.707| .102) .042) .039 .747 .093 
ee .215] .639] .546) .844| .599) — .101) — .042) — .043) 1.014;— .096 
a . 268) .296| .281] .741| .880) — .013) — .358) — .008 .620 .547 
. eae .322| .355) .337| .752| .984; .013;}—.019) .004 .059 .941 
a . 268) . 296) .281| .570| .912;— .005) .396) .002;— .704|) 1.302 



































+ +/1 — .730v,2.. The factorial composition of the tests 
becomes much more complicated when the factors are correlated. 

The next term to be defined is ‘structure.’ This consists 
of a matrix of correlations between tests and factors. In Table II 
the factors are uncorrelated among themselves, so that the 
coefficients are not only pattern elements, but also structure 
elements. Thus in equations (1) .3 is the coefficient of G and 
also the correlation between Test 1 and the factor G. When 








Factoring Factors 325 


factors are correlated, however, the pattern and structure values 
are quite different. If the reader is not a factorial analyst it 
may suffice to look at the entries in Table III. In the top row 
the entry .177 under ‘structure’ is the correlation between Test 1 
and the oblique factor 8,, while the entry —.035 under ‘pattern’ 
is the coefficient of the oblique factor 6; in the complete linear 
expression, which would be written, 


21 => .884a, = 0358; aoe 0307; = .0306; = .057e, + a,U, (2) 


[In case the reader is a factor analyst, I have included a matrix 
formulation of the methods for computing the ‘structure’ and 
‘pattern’ of the above solution in a later paragraph. ] 

The primary factor analysts would call the elements of the 
pattern in Table II evidence of ‘simple structure’ defined in 
geometric terms, so that one might say their simple structure 
is my oblique pattern. It is perhaps unfortunate that this 
inconsistency in terms exists, but it is due to the fact that I 
introduced the ‘structure’ concept before the primary factor 
analysts got around to oblique factors, and no inconsistency in 
terms was then apparent. 

We shall next proceed to oblique factor analysis. An oblique 
factor is usually thought of as one that is an average of a sub- 
group of tests. In vector geometry it is represented by a vector 
which is the centroid (or approximately that) of a subgroup of 
test vectors. This is the concept I have always used, and it is 
also one employed by the oblique primary factor analysts. In 
order to exhibit the method in simple form we shall illustrate the 
procedure for data of Tables I and II for the case of the oblique 
factor a; From these two tables it is apparent that Tests 1, 2, 
and 3 form a cluster with higher correlations amongst themselves 
than they have with the other ten variables. The factor a; 
will be defined as the centroid of this cluster, and its expression 
may be determined in the following simple manner: Add the first 
three equations from Table ITI (in the common-factor space) ‘to 
obtain 


21 a 22 + 23 = 1.4G + 2.2A (3) 


Next, divide both sides of equation (3) by the square-root of 
(1.4)? + (2.2)? = 6.80 giving 


a, = .537G + 8444 (4) 








326 The Journal of Educational Psychology 


This ‘normalization’ of equation (3) has the effect of putting 
the total, z; + z2 + 23, in standard form defined as the oblique 
factor a; By the same simple arithmetic we may obtain expres- 
sions for all five oblique factors defined by the subgroups as 
spaced in Table IT: 


a, = .5387G + .844A 


8, = .591G + .806B 

v1 = .562G + .642B + .522C (5) 
51 = .684G + .456B + .570D 

€, = .577G + .613D + .541E 


Let us interpret equations (5). They represent a factor pat- 
tern in the common-factor space. The factor G is common to 
all five oblique factors, while the factor B is common to (A, 71, 41, 
and factor Discommon to 6, and¢;. In the space here employed 


TABLE I1V.—CoRRELATIONS OF First-OrDER OBLIQUE FACTORS 


d 
Factor a1 Bi 71 61 €1 
_) Se 1 .000 
Diseeniaeewss .317 1.000 
ec ciacas tense . 302 . 850 1.000 
ee ree . 367 772 .677 1.000 
Meicstecdoiaxes .310 .341 .324 744 1.000 


the original factors 4, C and E are unique factors. We thus 
arrive quickly at the result that the mysterious second-order 
factors amongst the first-order factors are nothing but G, B, and 
D, which are three of the original orthogonal factors. When 
second-order factors are defined in the above manner, it is 
difficult to see how the primary factor analysts hope to obtain 
any new interpretation from them, no matter what tests are used. 

These analysts might reply that they would not proceed in the 
foregoing manner, but would first get the intercorrelations of the 
five oblique factors. These are simply obtained from equations 


(5) as follows: 


Tap, = .0387 X .591 = .317, etc. 


because equations (5) involve orthogonal factors. The com- 
plete list of such intercorrelations is given in Table IV. 








Factoring Factors 327 


If Table IV is analyzed by bi-factor methods we can get back 
exactly to equations (5). If, on the other hand, this table is 
analyzed by the centroid method with communalities in the 
diagonals, an orthogonal solution could be obtained which could 
then be rotated to other orthogonal or oblique reference axes, but 
it is not likely that the form (5) would be obtained. The fact 
remains, however, that if oblique factors are defined as above 
(and the primary factor analysts so define them), then the 
second-order factors may be interpreted as the original orthogonal 
factors or approximately as such. 

It really should not be necessary to go on to still higher-order 
factors to show that these are also the original orthogonal factors, 
but we shall do so for clarity and completeness. 

In the same manner in which equations (5) were obtained from 
Table II, we next obtain a third-order factor pattern by using the 
following grouping of oblique factors from equations (5): a2 = aj; 
Bo = Bi, ¥1, 13 Y2 = 61, «x. Normalizing as before gives 


a, = .537G + 844A 
B. = .667G + .691B + .189C + .207D (6) 
v2 = .675G + .244B + .633D + .290E 


These equations are again an orthogonal pattern in the common- 
factor space, indicating that the third-order factors common to 
the second-order factors a2, B2, and yz are again the original 
orthogonal factors G, B, and D. 

The intercorrelations of these second-order factors are readily 
found to be fas, = .358, Tay, = 362, and r¢,,, = .750. A 
Spearman single-factor pattern from these values would have the 
weights .416, .861, and .871. 

The point of this article is that the higher-order factor analysts 
are not getting anything psychologically new when they keep on 
factoring successive orders of factors. They may get functions of 
various combinations of ‘primaries,’ but this seems to be a 
circular and generally useless factorial procedure. 

The methodology followed above is described completely in 
the references at the end of this article, but it may be sufficient 
for the factor analyst to indicate briefly how Table III was 
obtained. If 7 denotes the matrix of direction cosines given 
by the coefficients in equations (5) and B is the matrix of coeffi- 








328 The Journal of Educational Psychology 


cients in Table II, then the structure values are obtained from 


the formula 
S = BT’ (7) 


The oblique pattern P of Table III is obtained from S and the 
matrix ¢ of Table IV by the equation 


P = S¢" (8) 


These calculations are all fully explained in the first of the three 
references cited. 

Some primary factor analysts regard oblique factors as more 
‘natural’ than orthogonal ones. Such ‘naturalness,’ however, 
would appear to depend upon the assumed nature of the tests 
and some geometric principle such as ‘simple structure.’ Our 
view is that either orthogonal or oblique solutions such as those 
presented in Tables II and III are statistically and psycho- 
logically useful in interpreting the original data, and we should 
stop factoring here and not go on to higher-order factors. Instead 
of the form of solution depending upon an act of nature, we 
believe it depends largely upon statistical criteria in the mind 


of the factor analyst. 


REFERENCES 


Holzinger, Karl J., and Harman, Harry H. Factor Analysis. 
Chicago: University of Chicago Press, 1941. 
‘“‘A Simple Method of Factor Analysis,’ Psycho- 


metrika, 1x (December, 1944), 257-62. 
“Interpretation of Second-Order Factors,” Psycho- 








metrika, x (March, 1945), 21-25. 








THE POWER OF THE ¢t TEST AND THE 
ESTIMATION OF REQUIRED SAMPLE SIZE 


WALTER L. DEEMER, JR. 


Air University, School of Aviation Medicine, Randolph Field, Texas 


Only rarely in published research in which the ¢ test is used is 
there any statement of the ‘power’ of.the test. The ‘power’ of 
a statistical test is defined as the probability of rejecting the 
null hypothesis. If the result of the ¢ test or the analysis of 
variance test is ‘significant,’ i.e., if the null hypothesis is rejected, 
the sample size was obviously sufficient, and the probability of 
error is clearly defined in terms of the rejection level or sig- 
nificance level chosen. But when the experimental results are 
such that the null hypothesis is not rejected, it is important to 
consider the power of the test. That is, it is important to 
consider the probability that the null hypothesis would have 
been rejected if it had been false. In a two-group comparison 
the sample may not lead to rejection of the null hypothesis 
merely because the sample was too small. The true difference, 
D, in the two mean values may be fairly large, yet a large propor- 
tion of random samples of size N from this population might 
not lead to rejection of the null hypothesis. This general con- 
cept of the power of a statistical test was developed over a decade 
ago by J. Neyman and E. S. Pearson.' 

Despite the fact that the concept is over ten years old, it seems 
to be little known, judging by research reports in current medical 
and psychological journals. It is the purpose of this paper to 
review the theory in simple language and give a formula for use 
in determining the required sample size to achieve the desired 
power. 

In drawing a conclusion from a statistical test, it is possible 
to commit either one of two kinds of error: 1) an ‘error of the 
first kind,’ which consists in rejecting a null hypothesis when the 





1 J. Neyman and E. S. Pearson, ‘‘On the Problem of the Most Efficient 
Tests of Statistical Hypotheses,” Phil. Trans. Roy. Soc. A, 231, 1933, 
289-337. 





, ‘The Testing of Statistical Hypotheses in Relation to Probabili- 
ties a Priori,”” Proc. Camb. Phil. Soc. 29, 1933, 492. 
, ‘Contributions to the Theory of Testing Statistical Hypotheses,” 





Statistical Research Memoirs, 1, 1936. 
329 








330 The Journal of Educational Psychology 


hypothesis is true; and 2) an ‘error of the second kind’ which con- 
sists in accepting a null hypothesis when it is false. The concept 
of errors of the first kind is well known. The probability of 
making an error of the first kind is sometimes called the ‘rejec- 
tion level’ or the ‘significance level.’ The logic of tests of 
significance is often explained in terms of the probability of an 
error of the first kind (we will call this Pi). For example, 
Lindquist? states: ‘‘We may either accept or reject the hypo- 
thesis, depending upon this relative frequency (of getting a 
sample as divergent or more divergent in absolute magnitude). 
If the relative frequency is small, we have the alternatives: 
(a) of rejecting the hypothesis, maintaining that it is unreasonable 
to suppose that something has happened in our sample that 
would happen only very infrequently if the hypothesis were 
true.” Lindquist goes on to state: ‘‘The level of confidence 
at which we may reject the hypothesis, then, depends (by 
definition) upon the relative frequency with which the hypo- 
thetical sampling error would be exceeded in absolute magnitude 
(without regard to direction) if the hypothesis were true.” 
But he fails to say why we should use the tails of the distribution 
as the rejection area. It would appear just as logical, from 
what Lindquist says, to use a little strip around the mean which 
contained five per cent of the total area. One of the few texts 
which mentions errors of the second kind is Walker’s.* She 
gives an interesting example and states: ‘‘ The first type of error 
can be minimized by arbitrarily regulating the level of sig- 
nificance demanded, that is by rejecting a hypothesis only when 
there is a very small probability that if true it will produce the 
observed data. The regulation of the second kind of error is a 
problem for higher mathematics.” Actually, the regulation of 
the second kind of error is quite simple and requires no more 
mathematics than regulation of the first kind of error. 
In order to explain it, we will need a notation. Let 
P, = probability of making an error of the first kind, i.e., 
rejecting a null hypothesis when it is true. 





2K. F. Lindquist, A First Course in Statistics, Boston: Houghton- Mifflin 


Co., 1942, p. 113. 
3H. M. Walker, Elementary Statistical Methods, New York: Henry Holt 


and Co., 1943, pp. 288-291. 








Power of t Test and Estimation of Sample Size 331 


P; = probability of making an error of the second kind, i.e., 
accepting a null hypothesis when it is false. 
D = the true difference between two population means 
%; = mean of the 7th group 
nm; = number of cases in the ith group 
1 











ee (2i(a — %1)* + 22(x — F2)?) 
2; = sum over the ith group 
t= Se is the usual Student’s ¢. 


$V (ni + 2)/(m1 N2) 

The usual method of testing hypotheses is to decide on a 
rejection area based on {—if |t|* is so large as to occur very seldom 
when the nul! hypothesis is true the null hypothesis is rejected. 
If we use the .05 level of significance and n; + nz — 2 = 30, we 
reject the null hypothesis whenever {¢| > 2.042, the value found 
in the ¢ table for n = 30 and p = .05. But what if |t|) > 2.042? 
We accept the null hypothesis, but we know that it is possible for 
D to be greater than zero and yet for us to get a sample that gave 
|t} > 2.042. The probability of this event is P2, the probability 
of an error of the second kind. Pz is clearly a function of D. 
If |D| is very nearly 0, then P: will be greater than if |D| is much 
larger than 0. The above test, based on both tails of the ¢ 
distribution, is the symmetrical ¢ test. It is used when either 
positive or negative values of ¢ are considered acceptable alter- 
natives to the null hypothesis. When only positive or negative 
values, but not both, are acceptable alternatives, then we reject 
only when ¢ is large positive (or negative) and accept the null 
hypothesis when ¢ is large negative (or positive). This is the 
‘one-tailed’ or ‘asymmetrical’ test. It is possible to use asym- 
metrical tests based on two tails; for example, we could reject 
the hypothesis whenever ¢ was greater than 2, or less than —2.5. 
In the following we shall use the term asymmetrical only for the 
one-tail test. This is consistent with Neyman’s usage. The 
asymmetrical test is somewhat easier to deal with and so we shall 
consider it first. We shall consider the symmetrical test later. 
Figures la to 1c will help to visualize the situation. Each figure 
shows two curves: the ¢ distribution under the null hypothesis, 





* The vertical bars are used to denote the absolute value of the quantity 
within the bars; i.e., the numerical value regardless of sign. 








332 The Journal of Educational Psychology 


A 8, 








t yy 


Xo xX, t 


Figure la. Distribution of t under Null Hypothesis (Curve A) and under 
Alternative (Curve B). Asymmetrical Test. Rejection Areas Shaded. 1-P; 
Less Than .50. 








Xo Xo (=t,) 


Figure 1b. Distribution of ¢ under Null Hypothesis (Curve A) and under 
Alternative (Curve B). Asymmetrical Test. Rejection Areas Shaded. 1-P: 
Equal to .50. 








Power of t Test and Estimation of Sample Size 333 


mean = 0, (curve A) and the ¢ distribution when the true differ- 
ence is D; (i = 1,2,3) (curve B). 

Curve B,, Figure la (mean Z;) is drawn to show the case when 
P, > .50; Bz, Figure 1b (mean Z2) is drawn to show the case 
when P; = .50 and B;, Figure lc (mean %;) to show when 
P, < .50. The shaded area of the right tail of curve A is the 
area of rejection. Whenever we get a value of ¢ equal to or 
greater than ¢, we reject the null hypothesis. When we get a 
value of ¢ < t; we accept the null hypothesis. Now if we have 
really sampled from a population with mean Z,, we shall be 








Xo t, Ky 


Figure lc. Distribution of ¢ under Null Hypothesis (Curve A) and under 
Alternative (Curve B). Asymmetrical Test. Rejection Areas Shaded. 1-P: 
Greater Than .50. 


making an error of the second kind whenever we get a sample ¢ 
less than t;. The probability of this is simply the area under 
curve B,, Figure la, to the left of ¢,, and since ¢, is to the right 
of Z, this area is greater than .50;i.e., P2 > .50. The exact area 
is easily computed. We find from tables of ¢ the area to the 
right of ¢; — Z, and subtract from unity to get the area to left 
of this point. Tables of ¢ are usually given for fixed values of p 
so the exact value of P; may be difficult to find if the total degree 
of freedom (n; + nz — 2) islessthan 30. If (mn: + m2 — 2) > 30 
normal curve tables may be used and interpolation will be 
unnecessary. P: may be pretty closely approximated, even if 
(n; + ne — 2) < 30, by interpolation in the ¢ table. Assume 
for example that Z, is 1.2¢ units* above Zp, that n1 + nz — 2 = 15 





* A ¢t unit is equal to the denominator of #, i.e., s ~/n1 + 2)/(nin2). 








334 The Journal of Educational Psychology 


and that P; = .05; ¢, will then be 1.753 (found from a ¢ table 
such as that in Fisher and Yates,* using half the tabled P values 
since the tabled values are for two tails). We need to find the 
probability of a value of ¢ greater than 1.750 — 1.200 = .553. 
From the same tables we find ¢ = .536 corresponds to p = .30 
(written t39 = .536) and t.5 = .691. Linear interpolation 
would give approximately t.25 = .553, so P2 is equal to 1 — .29 









PROBABILITY OF REJECTING NULL MYPOTHESIS 
(Psi-Po) 


- 


“ . = 
ad - + 4. a 
v - ad ¥ 7 — 








“4 


+3 2 *1 8) ' 2 3 
TRUE OIFFERENCE SETWEEN MEANS (0) 


Figure 2. Power Curve oft Test. Degrees of Freedom (nmi + nz — 2) = 15. 
P; = .05. Curve A Is Asymmetrical Test. Curve B Is Symmetrical Test. 





nmi + ne 
nine 





The Unit of the Abscissa Is the Denominator oft = S \ 


= .71. This is the probability that if D = 1.2¢ units the null 
hypothesis will be accepted. That is to say that in a large 
number of experiments in which D = 1.2t, only twenty-nine per 
cent will lead to rejection of the null hypothesis. 

Now consider curve B2, (Figure 1b). This has been drawn 
so that its mean, Z2, coincides with ¢;. The probability of 
making an error of the second kind must therefore be exactly 





“R. A. Fisher and F. Yates, Statistical Tables, London: Oliver and Boyd 
1938. 








Power of t Test and Estimation of Sample Size 335 


.50, since half the curve lies to the left of ¢; and half lies to the 
right of ¢;. 

Now let us consider curve B; (Figure 1c). We shall make an 
error of type 2 whenever we get a ¢ value less than ¢;, which is 
(%; — t:) units below Z;. As an example assume, as before, that 
mi +n.—2=15. Assume Z; = 3¢t units above Z. 3.000 — 
1.753 = 1.247. From the ¢ tables we find t.:5 = 1.074 and 
t.1o = 1.341, so to two places t.42 = 1.247, or P2 = .12. 

Notice that as D went from .553¢ to ¢ to 3t, (Figures la, 1b, 1c) 
P, went from .71 to .50 to .12. If we take a large number of 
values of D and calculate the P, value we can plot a curve of D 
against P,. Figure 2, curve A, shows such a curve plotted. It 
plots the value of the probability of ‘rejecting’ the null hypo- 
thesis plotted as ordinates (i.e., it plots 1 — P2) against D as 
abscissa. (P2 is probability of accepting false hypothesis). 
By plotting 1 — P2 we are plotting the ‘power’ of the test. 
The curve is known as the ‘power curve.’ The values from 
which Figure 2 were plotted were obtained by getting D for 
the values of p given in the tables of Fisher and Yates. This 
eliminates the need for interpolation. The tables of Fisher 
and Yates give p values for the symmetrical test. These must 
be halved for the asymmetrical test. Table I shows how some 
of the values for curve A of Figure 2 were calculated. 


TaBLE 1.—CompuTING SHEET FOR CuRVE A OF FIGURE 2, 
SHOWING SOME SAMPLE VALUES 


P= (1 — P») to D(= t; — te) 
(Column headings in (t; is the value of 
t table divided by 2.) t for the chosen P;.) 

01 2.602 —0.849 
.05 1.753 0.000 
.10 1.341 0.412 
.30 0.536 1.217 
.50 0.000 1.753 
.70 —0.536 2.289 
.90 —1.341 3.094 


Notice that at D = O, P is equal to P;. This, of course, is 
necessary since we are plotting the probability of rejecting, and 
when we reject at D = O we are making an error of the first kind. 

Notice also that curve A to the left of D = Oisdashed. This 








336 The Journal of Educational Psychology 


is to draw attention to the fact that when using the asymmetrical 
test it is assumed that D> O. This portion of the curve is 
shown to indicate how poor the asymmetrical test is when D 
can be less than O. It is poor in the sense that the null hypo- 
thesis has small probability of being rejected when D < O. 

We now have a criterion for choosing a test of statistical sig- 
nificance. Of two tests that have equal value of Pi, we choose 
the one with the greatest ‘power,’ i.e., the one which is most 
likely to lead to rejection of the null hypothesis when the null 





j 
7 


™ 4 Xo X, t, 





Figure 3. Distribution of ¢ under Null Hypothesis (Curve A) and under 
Alternative (Curve B). Symmetrical Test. Rejection Areas Shaded. 1-P: 


Less Than .50. 


hypothesis is false. It is possible that one test might be more 
powerful for certain alternative hypotheses and another test 
might be more powerful for others, so in some cases it is necessary 
to consider what types of alternatives we are most desirous of 
detecting. In the asymmetrical test, for example, we have very 
little probability of detecting a negative value of D. A very 
complete discussion of various types of power curves will be 
found in the article by Neyman and Pearson.° 





5 J. Neyman and E. S. Pearson, ‘‘Contributions to the Theory of Testing 
Statistical Hypotheses,”’ Statistical Research Memoirs, 1, 1936. 








Power of t Test and Estimation of Sample Size 337 


Now we can consider the slightly more difficult case of the 
symmetrical ¢ test. In this test the null hypothesis is rejected 
when the absolute value of tis large. For purposes of illustration 
we shall consider the ¢ test based on fifteen degrees of freedom 
with the level of significance at P; = .05. With the level of 
significance at P; = .05, we reject the null hypothesis whenever 
we get a value of ¢ greater than 2.131 or less than —2.131, i.e., 
whenever |t| > 2.131. Figure 3 shows the ¢ curve for the null 
hypothesis (curve A) and for an alternative hypothesis (curve B). 

The probability of rejecting the null hypothesis when the true 
mean is Z, is equal to the area under curve B to the right of ¢, 
and to the left of —¢;. It is this little area to the left of —t;, 
that makes the power curve for the symmetrical test slightly more 
difficult to compute. To reduce the need for interpolation, the ¢ 
values from the ¢ table are used for the P’s given there, as in 
Table I, and the D values are computed as in that table. The 
area to the left of —¢,; must now be computed by interpolation 
and added to the other P value. This sum is the power of the 
test. The power curve will be symmetrical around D = 0, 
so only positive values of D need be computed. Table IT shows 
the computing sheet for certain values of D. 


TaBLE I].—ComputTinGc SHEET FoR CurRVE B or Ficure 2, 
SHOWING SOME SAMPLE VALUES 


P = (1 — P:) to D D+, P’ P+PpP’ 

(area to right (area to left (Power) 
of t,) of —t,) 
.025 2.131 0 2.131 .025 .05 
.05 1.753 .378 2.509 O01 .06 
.10 1.341 .790 2.921 .005 .105 
15 1.074 1.057 3.188 < .005 15 
.30 0.536 1.595 .30 
.50 0.000 2.131 .50 
.70 —0.536 2.667 .70 
.90 —1.341 3.472 .90 


It will be noticed from Table II that the area to the left of 
—t,; becomes less than .005 by the time D = 1.057, so after that 
value of D, P’ may be neglected. 

The power curve for the symmetrical ¢ test is drawn in Figure 2 
as curve B. Notice that the B curve lies below the A curve 








338 The Journal of Educational Psychology 


throughout the range of positive D values. Hence, if positive D 
values are the ones we want to detect the asymmetrical test 
is the better. But if we also want tc detect negative D values 
we should use the symmetrical ¢ test, as is obvious from comparing 
curve B and curve A for negative values of D. 

It will be noticed that the unit of measurement for the abscissa 
in Figure 2 is the denominator of {,, i.e., 


p Ni + Ne 
NyNe 
where s is the estimate of the standard deviation. 


The power of the test can therefore be increased in terms of 


ni + Ne 
Nine ; 











This may 





original units by decreasing the size of s4/ 


be done by reducing the size of s (by increased precision in the 
measurements, for example), or by increasing n; and/or nz or by 
all of these at once. 

Let us recapitulate at this point what we have developed above. 
There are two kinds of errors possible when making a test of sig- 
nificance. P, is the probability of a type 1 error, rejecting a true 
hypothesis. P: is the probability of a type 2 error, accepting a 
false hypothesis. P; is usually fixed for any given test. For any 
given true difference, D, P, may be easily computed. P» varies 
inversely as D, the relationship being clearly shown by the power 
curve of the test. The four variables are P;, P2, D and N (the 
degrees of freedom). As soon as three of these are given, the 
other is fixed. We have considered above the case where P; 
and N were fixed and have shown how D and P, were related. 
This is the usual consideration after an experiment has been 
performed and ¢ found not significant. 


ESTIMATING N, KNOWING P), P2, AND D 


A far more efficient attack is to consider P;, P2, D and N before 
the experiment is started. P:, P2, and D(= D,;) can be fixed 
by consideration of the experimental problems involved, D being 
fixed at some value it is believed important to detect. Then we 
can compute N to give these values. When we do this figuring 
before we start we are insured against wasting time and experi- 
mental material on an experiment which is unlikely to give 
results of the necessary precision. That is, we can fix two 








Power of t Test and Estimation of Sample Size 339 


points on the power curve (P; at D = O) and (P2 at D = D,) 
and find the N that will give a curve through these two points. 
In an actual problem we are not interested in D, but in d, the 


d 
difference in original units, or in d’ (- rv, the difference 
in units of s+/2. The relationship between D, d, and d’ is: 


D=—. = 4tV2 La Vn 

Vn s/n 

V2 
where n is the number of cases in each group. That is, we 
assume n,; = nm2=n. This assumption simplifies the work. 
Now we note that for a fixed D, d’ decreases as +/n increases, or 
conversely, for a fixed d’, D decreases as +~/n increases. The 
shape of the curves in Figures la, 1b, lc and 3 are independent 
of n,* but as n changes, the scale of the abscissa changes in terms 
of d’ units. For example, for D'= 2 and n = 16 we have 
d’ = 14; but for D = 2 and n = 36 we have d’ = 4. In other 
words, the probability of detecting a given d’ will increase as n 
increases. The problem of determining m therefore is seen to 
become simply the problem of determining the scale such that 
we get the desired probability of rejecting the null hypothesis 
when we have a difference d’. 

If in Figure 1a we let ¢, be the distance from Zp to ¢; and t2 be 

the distance from Z, to t;, we have the fundamental relationship: 


D=%t — te (1) 


Now we express D in terms of s and d and d’ as before 








co 
= 














sV¥2 svV2 
a/n 
D = t; — te now becomes d’ +/n = t; — ts (2) 
We solve this for n and get 
_ @ — &)* 
3 = (d’)? (3) 





* Except for the approach to normality of the ¢ curve as N increases. 
There is little change after N(= 2n — 2) = 30. 








340 The Journal of Educational Psychology 


If we are using the symmetrical test we have exactly the same 
formula for n. The only difference is that ¢, is found for P, 
instead of 14P,. That is, we use the p given in the ¢ table 
instead of halving it. In general, P2 will be greater than .50, in 
which case the area of B (Figure 3) to the left of —?, is negligible. 

If one distribution is being compared with a standard, (as in 
the ¢ test when the individual measures occur in pairs and we are 
testing the distribution of differences to see if it is consistent 
with the hypothesis that the mean difference is O), the formula 


d : . 
remains the same, but d’ = - where s is the estimate of the 


standard deviation of the distribution of differences. The 


factor +/2 has been absorbed in the taking of differences between 
individual measures. 

We still have one slight problem to solve before we can use the 
formula for n. We don’t know which line of the ¢ table to use in 
looking up ¢t; and ¢, from P; and P»2 until we know n, and that is 
what we are looking for. It is therefore necessary to use a 
process of successive approximations, though actually the second 
approximation gives n as close as necessary for practical work. 
Since this n will always be slightly too high, it will be a con- 
servative estimate of n. An efficient way to start is to use the 
n = 30 line of the ¢ table to find ¢; and é.. 

Remember that for a comparison of two groups of n cases 
each the N of the ¢ table is 2n — 2. So if we find n = 16 from 
using N = 30, we need no further approximations. If n < 16 
when N = 30 then we should repeat the calculation of n from 
formula (3) using N = 2n — 2. The next larger integer than 
this value of n will be large enough for n. 

Let us do an example using formula 3. We shall set P; = .05 
1 — P, = .80 and d’ = .75. From the ¢ table we find for a 
symmetrical test (V = 30) ¢1, = 2.042, and for the asymmetrical 
test (N = 30) tia = 1.697; (the subscript s refers to the sym- 
metrical case and the subscript a refers to the asymmetrical 
test.) ¢: for both tests is found from the p = .20 column (or 
if using the tables of Fisher and Yates the p = .40 column). 
This gives tj = —.854. (Remember that ¢ is negative when 
it is to the left of the mean, i.e., whenever P, < .50.) 

Formula (3) then gives for the symmetrical test (using m: 
for first approximation) 








Power of t Test and Estimation of Sample Size 341 


_ (2.042 + .854)? _ 
rn) = (75)? = 14.9 





To make the second approximation we first find the N2(= 2n,; — 
2). This is (2) (14) — 2 = 26 (we use the next ‘smaller’ integer 
value of n; when finding N2). 





tis = 2.056 
to, = .856 : 
_ (2.056 + .856)? _ 


The next higher integer, 16, should be used as n. 
For the asymmetrical test we get 


_ (1.697 + .854)? 








ny (75)? = 11.6 
Now we look up ti, for N = 2(11) — 2 = 20 and find 
lia == 1.725 
log = .860 
_ (1.725 + .860)? _ 
Ne = (75) = 11.9 


The next higher integer, 12, should be used as n. 

If we find after using formula (3) that the required n is more 
than we can use (because of availability of cases or for economic 
reasons) it is necessary to revise our demands with respect to P,, 
P, or d’. Usually P; is kept fixed. It is usually good practice 
to keep d’ fixed at some value which it is believed important to 
detect. The decision as to d’ is, of course, an experimental rather 
than a statistical problem. Then for the largest n considered 
practical we can calculate P2. If this value is too large to meet 
the experimental requirements we can try to find some way to 
reduce s, possibly by more rigorous experimental controls. 

Some useful tables will be found in an article by J. Neyman 
and B. Tokarsa.* These tables are used to find D for a given N 
and P,. There is a table for P; = .05 and one for P; = .O1. 
In both tables N goes by ones from 1 to 30 and P:2 has values 
01, .05, .10 and thence by multiples of .10 to .90. These tables 





6 J. Neyman and B. Tokarsa, ‘‘Errors of the Second Kind in Testing 
Students’ Hypothesis,” Journal of The American Statistical Association, 31, 
1936, 318-326. 








342 The Journal of Educational Psychology 


are to be used when the N is known and D and P; are of interest. 
These tables will save some time, but the ordinary ¢ tables take 
very little more time to use. The article is quite interesting 
however for its own sake. 

If we have three or more groups to compare, so that we use the 
analysis of variance instead of the ¢ test, the problem of the 
power of the test becomes rather more difficult since D cannot be 
immediately expressed, there now being at least three pair- 
differences. Discussion of the power function for tests of 
analysis of variance will be found in papers by Lehmer’ and by 
Tang. The method described above for ¢ may be substituted 
for the methods discussed in these references by considering a 
single degree of freedom of the total degrees of freedom between 
the group means. If we have three means, %;, ¥2, ¥; to be com- 
pared, we might consider two of these means that we feel are 
very important and compute 7 as if we had only these two groups. 
Then if we use n in each of the three groups we are sure to get 
the precision we want for the pair comparison. If such a pair 
cannot be picked out, then the more complex methods described 
in the articles by Lehmer and Tang should be used. Once the 
concept of the power function of the ¢ test has been mastered, 
the papers by Lehmer and Tang will not be difficult to understand. 

It is strongly recommended that whenever an experiment 
yields a ¢ which is nonsignificant, a power curve be plotted so 
that the nonsignificance of ¢ may be properly interpreted. The 
scale of the abscissa should be d’ or d, not D. If the number of 
degrees of freedom is small, the test will not be very powerful, 
and the nonsignificance may not be a very important finding, 
since even large values of d’ might often lead to nonsignificant ¢ 
values. As a minimum, it is recommended that the power of 
the test be computed for d’ equal to the experimentally obtained 


difference. 





7 Emma Lehmer, “Inverse Tables of Probabilities or Errors of the Second 
Kind,” Annals of Math. Stat. 15, 1944, 388-398. 

8 P. C. Tang, ‘‘The Power Function of the Analysis of Variance Tests 
with Tables and Illustrations of Their Use,’’ Statistical Research Memoirs, 2, 


1938, 126-149. 











THE RELATION BETWEEN IQ 
AND TRAIT DIFFERENCE AS MEASURED 
BY GROUP INTELLIGENCE TESTS! 


J. W. TILTON 


Yale University 


Over a long period, the relation between scatter and intelligence 
quotient has been a topic for investigation.? It continues to 
claim attention. McNemar? has reported upon the relationship 
as found with the 1937 revision of the Stanford-Binet. But the 
group-intelligence-test counterpart of scatter has not been fre- 
quently studied. Woodrow‘ reported an extensive analysis 
using the results of the Arthur and Woodrow group intelligence 
scale, and holding mental age constant. He refers to prior 
studies by Maud A. Merrill,’ J. C. DeVoss,® and A. W. Brown,’ 
but the Merrill study was concerned with Binet scatter, De Voss 
studied achievement test scores, and Brown held chronological 
age constant, not mental age. 

‘An examination of the Woodrow data reveals a need for further 
study. After reporting results at eighteen half-year mental-age 
levels from 8:0 to 16:11, Woodrow averaged his findings and 
reported that unevenness is at a minimum for average pupils, and 
that it is greater for both brighter and duller pupils but not as 





1 Grateful acknowledgement is made to H. M. Foulds, Jr. and Howard 
Stoertz for assistance with the computation; to the Yale University Com- 
mittee on Bursary Appointments for making this assistance possible; and to 
J. C. Caughlan, Norma E. Cutts, W. N. Durost, World Book Company, 
and J. W. Wrightstone, for help in securing data. 

2A. J. Harris and D. Shakow, ‘‘The Clinical Significance of Numerical 
Measures of Scatter on the Stanford-Binet,’’ Psychological Bulletin, 1937, 
34: 134-150. 

$ Quinn McNemar, The Revision of the Stanford-Binet Scale. Boston: 
Houghton Mifflin Company, 1942. 

4 Herbert Woodrow, ‘“‘ Mental Unevenness and Brightness,”’ Journal of 
Educational Psychology, 19: 289-302. 

5’ Maud A. Merrill, On the Relation of Intelligence to Achievement in the 
Case of Mentally Retarded Children. Comparative Psychology Monographs, 
1924, vol. 2, no. 10. 

¢L. M. Terman, Genetic Studies of Genius. Stanford University: 
Stanford University Press, 1925, chap. 12. 

7A. W. Brown, Unevenness of the Abilities of Dull and of Bright Children. 
New York: Teachers College Contributions to Education #220, 1926. 

343 








344 The Journal of Educational Psychology 


great for the bright as for the dull. A condensation of his find- 
ings is shown in Table I. 


TABLE I.—A CONDENSATION OF Wooprow’s DaTA 


IQ Unevenness 
above 104 106 per cent 
96-104 100 per cent by definition 
below 96 111 per cent 


An inspection of the full table led the writer to question the 
validity of the conclusion, as based on an averaging of all of the 
data for the eighteen different mental-age levels. The data for 
the four levels from 10:0 to 11:11 showed less uneveness for the 
high IQ pupils than for those of average IQ.' In other words, 
Woodrow’s conclusion concerning bright pupils was contrary to 
his data for mental ages 10:0-11:11. It rested upon the weight 
of the data for the lower and higher mental ages, especially the 
latter. 

There are reasons for placing more confidence in Woodrow’s 
data at the 10:0—11:11 levels than in that for the higher and lower 
levels. A single test yielded the data from 8:0 to 16:11. Wood- 
row assures the reader that the test was adequate throughout 
because it differentiated at all the levels at whichi' ‘as used, but 
this is not a sufficient guarantee of its adequacy for 4 comparative 
study of profile unevenness. The question is whether or not any 
skewness was introduced into any of the subtest scores at the 
bottom or top levels. If it was, there is an uncertainty as to 
how the resulting spurious unevenness would affect the com- 
parisons. It is at the 10:0—-11:11 levels that this factor is least 
likely to have interfered with the purpose of the investigation. 
In the second place, the sampling of the IQ range is best at these 
levels. At the other levels the range is much restricted, at the 
high end for the low mental ages, and at the low end for the high 
mental ages. In the latter case there were twenty-five cells in 
which there were no entries. Hence, in averaging for the whole 
table Woodrow was very largely comparing the bright at high 





1 Tf it is maintained that the term IQ should not be used for the results of 
group testing, the writer must plead guilty. Most references in this paper 
to ‘mental age’ are to the normative equivalents of group test scores, and 
most references to ‘IQ’ are to ratios of group test ‘mental ages’ to chronologi- 


cal ages, 








Relation between IQ and Trait Difference 345 


mental ages with the dull at the low levels. A respect in which 
the data in the upper ages may be inferior to that at the lower 
levels is in the possible influence of sex difference, and another 
reason for questioning a conclusion influenced heavily by the 
data for the high mental ages is that probably his IQ’s were based 
on actual chronological ages up to 16:0.! Terman and Merrill? 
reported with their revision that chronological ages above 13:0 
should be discounted in order to make IQ’s comparable. Lack- 
ing this modification, Woodrow’s data for the upper mental age 
levels are not comparable with those for the lower levels. 


PRESENT STUDY 


The present study was undertaken in order to eliminate some 
of the uncertainty concerning the validity of Woodrow’s con- 
clusions. The intention was to make a similar analysis, avoiding 
so far as possible those aspects of procedure which may have 
allowed uncontrolled factors to affect the results. Woodrow’s 
procedure of holding mental age constant within half-year 
intervals was followed between 10:0 and 11:11. At the other 
levels it was held constant within twelve-month intervals. This 
greater interval was used in part because of the smallness of the 
populations tested and in part because at the higher levels, the 
raw-score range for a whole year interval was so limited as to 
make half-year intervals unnecessary. In all cases Woodrow’s 
measure of unevenesss was used; namely, for each pupil, the 
average deviation of his subtest scores from hisown mean. With 
one exception (Kuhlmann-Anderson data) scores were converted 
into sigma scores. The departures from Woodrow’s procedure 
were introduced as safeguards for validity. Where in the 
Woodrow study the sigmas for the corresponding chronological- 
age groups were used in computing sigma scores, in the present 
study the sigmas used were those for the mental age groups 
before they were separated into IQ subgroups. In the one case 
in which a generous amount of data were available (National 
Intelligence Test data) with average IQ defined as 95-105 





1 This is an inference based upon the facts that sixteen pupils with mental 
ages above 14:5 were tabulated as having IQ’s below 96, and fifteen with 
mental ages above 15:11 were tabulated as having IQ’s below 105. 

2L. M. Terman and Maud A. Merrill, Measuring Intelligence. Boston: 
Houghton Mifflin Company, 1937, pp. 29-31. 








346 The Journal of Educational Psychology 


inclusive, a child with lower IQ was matched with one whose IQ 
was as much above 100 as the first child was below. At all levels, 
data were computed for boys and girls separately. Perhaps 
most important of the safeguards were (a) the use in each case 
of data from only a small fraction of the mental-age range which 
the test was supposed to measure, and (b) the choice of thissmall 
fraction from the middle of the advertised range. 

The objectives of the present study were three: first, to find 
out whether within the 10:0-11:11 range, the trend of the Wood- 
row data may be said to be reliably established; second, to find 
out, with more dependable data, what the relation is for mental 
age 8; and third, also with less ambiguous data, what the relation 
is for mental ages 13-15. 


DATA AND RESULTS 


Data were secured from five testing programs, as follows: 

1) National Intelligence Test, Scale A. This was the first 
set of data obtained. The purpose was to test the reliability 
of the relationship shown by Woodrow’s 10:0—-11:11 data. These 
data were available in the files of the Yale Department of Edu- 
cation, the results of a community-wide testing program in 
grades III to VIII inclusive. This was the most extensive set of 
data available and, hence, results are shown in detail in Table II. 
Defining the unevenness of the average IQ pupils as 100, the 
unevenness of the dull was one hundred seventeen per cent and 
the unevenness of the bright was eighty-nine per cent. If 
Woodrow had reported similar figures for mental ages 10:0-11:11 
he would have reported one hundred nine per cent and ninety- 
three per cent for the dull and bright, respectively. It seems 
that Woodrow’s data at the same mental-age levels were not 
chance variants from the trend he reported for his whole table. 
In the data of Table II, the .680 is lower than the .764 by 3.3 
times the standard error of the difference. The .893 is larger 
than the .764 by 4.4 times the error of the difference. Both the 
Woodrow data and the data of Table II suggest a greater differ- 
ence between dull and average than between bright and average. 
However, the difference between these differences for Table II 
data is only slightly larger than its error, and among the girls 
the bright are farther below the average than are the dull above 


them. 














347 


Relation between IQ and Trait Difference 


TABLE II.—AVERAGE UNEVENNESS SCORES AT MENTAL AGES 
10:0-11:11 NATIONAL INTELLIGENCE Test DaTA 
(Numbers of cases in parentheses) 




















Unevenness| Unevenness| Unevenness 
Mental age Sex for IQ’s be-| for IQ’s for IQ’s 
low 95 95-105 above 105 
10:0-10:5 B .973 (18) .787 (26) | .551 (18) 
10:0—-10:5 G .912 (15) .785 (20) | .712 (15) 
10:0-10:5 B and G| .945 (33) .786 (46) .624 (33) 
10:6-10:11 B .868 (40) .793 (44) .707 (40) 
10:6-10:11 G .891 (55) .830 (42) .641 (55) 
10:6-10:11 | Band G| .881 (95) .811 (86) .669 (95) 
11:0-11:5 B .976 (32) .680 (40) .717 (32) 
11:0-11:5 G .871 (43) .761 (38) .656 (43) 
11:0-11:5 B and G| .916 (75) .748 (78) | .682 (75) 
11:6-11:11 B .859 (42) .745 (45) | .702 (42) 
11:6-11:11 G .849 (23) .739 (31) | .758 (23) 
11:6-11:11 | Band G| .855 (65) .743 (76) .722 (65) 
10:0-11:11 B .906 (132) | .749 (155) | .687 (132) 
10:0-11:11 G .880 (136) | .782 (131) | .673 (136) 
10:0-11:11 | Band G| .893 (268) | .764 (286) | .680 (268) 
SE + .023 + .019 + .016 





2) Pintner General Ability Test, Verbal Series—Intermediate 


Test: Form B. This set of data designed to supplement the 
National Intelligence Test data was more limited both as to 
population and grade range. It was the result of testing in 
grades V to VIII inclusive, and hence was adequate for an 
analysis of only mental ages 11:0-11:11. The percentages are 
101 for the dull and 95 for the bright. For the National and 
Pintner data combined, the weighted average percentages are 
112 for the dull and 88 for the bright. 

3) Detroit Primary Intelligence Test—For Grades II, III, and 
IV, Form C. Scores on this test obtained in grades II, III, and 
IV were analyzed for mental ages 8:0-8:11. The unevenness in 


the profiles of the dull was 103 per cent of that in the profiles 








348 The Journal of Educational Psychology 


of the children with average IQ’s. The percentage was 99 for 
the bright. 

4) Kuhlmann-Anderson Tests, Fifth Edition, Grade II. This 
set of data was also used for mental ages 8:0-8:11, but it was less 
satisfactory for the purpose, in that the data came from only one 
grade, and that grade the second, late in the year. Grade III 
would have been better. The logical procedure would have 
been to select lower mental ages, but this was not feasible. The 
parts of this Kuhlmann-Anderson battery by design differ mark- 
edly in difficulty. The profile graph inside the cover page shows 
that the first four tests might be used in a study like this for 
mental age 6, the fifth and sixth tests are suitable for mental 
age 7, and the last four are usable at mental age levels 8:0-8:11. 
Among the population tested there were very few 6’s. The num- 
ber of 7’s was not large and the number of suitable tests at this 
level too few. The best data which this testing program yielded 
for the purpose of this study were for mental ages 8:0—-8:11 as 
measured on the last four tests, numbered in the battery as 14, 
15, 16, and 17. Second-grade pupils with a mental age of 8 are 
of above-average brightness. A representative sample would 
have been preferred, but the data were available and were used. 
The scores were not converted into sigma scores. Since each 
of the separate tests is standardized with mental age equivalents, 
it was thought that conversion into sigma scores would not have 
made the scores more comparable. 

The percentages were 101 and 93 for dull and bright, respec- 
tively. Weighted averages percentages for the two sets of data 
for mental ages 8:0-8:11 are 103 for the dull and 95 for the 
bright. 

5) Terman-McNemar Test of Mental Ability—Form C. This 
set of data came from the testing of a six-year junior-senior high 
school. IQ’s were taken from the Terman-Merrill Table,'! and 
sorted as before into three categories, ‘below-average,’ ‘average’ 
and ‘above-average.’ Calling the unevenness of the profiles of 
the pupils of average ability equal to one hundred per cent as 
before, the unevenness of the dull was one hundred fifteen per 
cent and that of the bright was ninety-seven per cent. 





1L. M. Terman and Maud A. Merrill, Measuring Intelligence. Boston: 
Houghton Mifflin Company, 1937. Appendix, p. 417. 











349 


Relation between IQ and Trait Difference 












































(€18) (289) (699) WYZUIq pus ssvI0AB ‘T[Np 10} 
Z6 OOT OT SOL10Z9489 9aIY} A[UO Ul BYVp SUIBS 9], 
(162) (ZZS) (289) (68F) (022) 
0 16 8 26 0 OOT 8 90T T' LUT pourquiod Byep [TV 
(09) (#8) (26) (ZOT) (FL) 
1°16 6° £6 | 0 001 | 0° SOT (amas! LE 11-0:01 10} ByBp S,MOIPOOM 
(1¥Z) (SEP) (¢g¢) (2€8) (9FT) S, MOIPOOM 
0' 16 9°26 0 OO0T € LOL Cc’ stit 0} IBIIWIS Sol10Z0}89 UI BYBp UTES OT, 
(9) (OZ) =| (88h) (¢g¢) (288) (221) (61) 
€¢o8 | 026 | 9°26 | OOO | € 201 | 9 LIT | F Fer sedvyuso1ed VFvIOAG pozYyTIOM 
(9) (62) (29) (ZOT) 
9° 18 2 001 @ L6 0° OOT Il:SI-O: FI IBUION OP-UBULIOT, 
(Z) (ST) (#9) (ZF) 
6 ZO0I b 16 0 OOT Cc Fil IT-€1-0' €1 IBVUL9 NOP -UBvUla 7 
(6) (88) (29) (29) (02) () 
1 26 b S6 0 OOT €° 10l ¢ O01 8 OOT | IT-TI-T IT JeuyUul 
(GZ) (C6) |(TZT) |(89%) = | (P9T) (88) (91) 
L198 b 28 & 18 0 OOT 0 OLT 0 FI 6° S23 | IT ITI-0:01 [BUOT}VN 
(¢) (64) (06) (TP) (OT) (g) 
6° 28 ¢ 26 I #6 0 OOT 8 ZOl @ £6 IT-8-0:8 uosiopuy-uueuy Ny 
(12) (29) (8¢) ($9) (9T) 
0 66 9° 66 0 OOT Cc IO0l 0 Sor TT: 8-0:8 pOIZACT 
saoqy |¥GI-SIT| FII-SOT | FOI-96 | 6-98 | F8-G2 loz moreg eo ie 
sdnoiy O] 











(guBeysuod A]ZAI}PBIAI Ploy SI o3¥ [eyUITY 


[sosoyjzueied UI sasvd JO sloquIN\N]] 


‘uor}Iuyep Aq QOT sjenbs dnoid Hy] esvi19Av oy} JO ssauusAguN oy, 7) 


SdN0ur) HO] JO SSANNGAAN() AWAOUG ISA], AONADITIALN] aNOUD—']]] ATAV] 








350 The Journal of Educational Psychology 


SUMMARY OF RESULTS 


The results from the five sets of data are shown in Table III. 
Woodrow’s data for mental ages 10:0-11:11 are added at the 
bottom of the table. Generalizing for all six sets of data, 
the profiles of the bright are eight per cent less uneven, and the 
profiles of the dull are ten per cent more uneven than the profiles 
of the average pupils. Within the IQ range from 85 to 114 the 
percentages are seven per cent more unevenness for the dull and 
seven per cent less for the bright. In other words, such departure 
from a straight line relationship as exists is contributed by pupils 
outside these IQ limits, possibly some by the duller, mostly by 
the brighter pupils. The trend of Woodrow’s data at mental 
age levels 10:0—11:11 is confirmed, and a similar trend is shown 
at lower and higher levels. 


DEGREE OF RELATIONSHIP 


Comparisons of group averages bolstered by statements of 
significance give an exaggerated impression of relationship. The 
data are now presented as coefficients of correlation so that it will 
be apparent that the relationships reported above are very low. 
Corrections for curvilinearity would leave them low. The corre- 
lations for the first six rows of Table III are —.10, —.07, —.27, 
—.09, —.14, and —.08. These coefficients are similar to —.10, 
—.02, —.10, and —.10, reported by McNemar' for scatter. No 
one would think of using such correlations as a basis for individual 
predictions. 

It was not possible to compute reliability coefficients for 
measures of profile spread for all of the sets of data used. Coeffi- 
cients were computed for the Detroit-Primary 8:0-8:11 data and 
for the Terman-McNemar 13:0—13:11 data, in both cases by using 
odd-even scores and the Spearman-Brown formula. The coeffi- 
cients were .76 and .33, respectively. The Form L with Form M 
coefficients are .37 or .36 and .19 for scatter at the corresponding 
mental age levels.” 

The only other reliability coefficient for the amount of uneven- 
ness among group intelligence test scores which the writer has 
computed was obtained from a set of data not included among the 





1 Op. cit. 
2 McNemar, op. cit., p. 80. 








Relation between IQ and Trait Difference 351 


five described above. It was the result of the testing of a small 
fourth-grade population (N-112) with both scale 1 and scale 2 of 
the Pintner-Durost Elementary Test. In this computation, as 
before, odd-even scores were used and the results stepped up by 
the use of the Spearman-Brown formula. But in this case 
mental age was not held constant and the reliability was raised 
by the presence of twelve tests as compared with seven tests in 
the battery. In spite of these positive influences, the coefficient 
was only .56. This testing was done carefully with the idea of 
getting as high reliability as possible. The Spearman-Brown 
coefficient is .97 for the total score. Although total score and 
unevenness score are both functions of all the part scores, indi- 
vidual differences in the latter are much less reliably measured. 


INTERPRETATION 


No attempt will be made in this paper to account for the rela- 
tionship between ‘IQ’ and unevenness which has been reported. 
It may be the result of several factors. But since the relation- 
ship is low and a great deal of unreliability is involved, one of the 
first factors which should be evaluated is the extent to which the 
relationship itself is due to unreliability. In other words, ‘Is 
dullness associated with more than average unreliability in 
taking group intelligence tests?! Not until this and other 
factors have been investigated may one safely undertake to 
summarize the situation with regard to group-test profile uneven- 
ness as Lorr and Meister? did for scatter. In other words, at the 
present time, it would be inappropriate to refer to the demon- 
strated negative relation between ‘IQ’ and unevenness as a 
relation between ‘IQ’ and specialization of ability. 


REFERENCES 


A. W. Brown, Unevenness of the Abilities of Dull and of Bright 
Children. New York: Teachers College Contributions to Edu- 
cation #220, 1926. 





1 Miss Florence Snodgrass, a Teaching Assistant in the Department of 
Education, is now engaged in an attempt to answer this question. 

2M. Lorr and R. K. Meister, “‘The Concept of Scatter in the Light of 
Mental Test Theory,’’ Educational and Psychological Measurement, 1: 
303-310. 








352 The Journal of Educational Psychology 


A. J. Harris and D. Shakow, ‘“‘The Clinical Significance of 
Numerical Measures of Scatter on the Stanford-Binet,’’ Psycho- 
logical Bulletin, 1937, 34:134-150. 

M. Lorr and R. K. Meister, ‘‘The Concept of Scatter in the 
Light of Mental Test Theory,’ Educational and Psychological 
Measurement, 1:303-310. 

Quinn McNemar, The Revision of the Stanford-Binet Scale. 
Boston: Houghton Mifflin Co., 1942. 

Maud A. Merrill, On the Relation of Intelligence to Achievement 
in the Case of Mentally-retarded Children. Comparative Psy- 
chology Monographs, 1924, vol. 2, no. 10. 

L. M. Terman, Genetic Studies of Genius. Stanford University: 
Stanford University Press, 1925, chap. 12. 

L. M. Terman and Maud A. Merrill, Measuring Intelligence. 
Boston: Houghton Mifflin Co., 1937, pp. 29-31. 

Herbert Woodrow, ‘‘Mental Unevenness and Brightness,” 
Journal of Educational Psychology, 19:289-302. 








A STUDY OF CERTAIN ASPECTS 
OF THE LEE-THORPE OCCUPATIONAL INTEREST 
INVENTORY 


HENRY C. LINDGREN 


Veterans Administration, Advisement and Guidance Division, Branch 12 
Office, San Francisco. 


This study proposes to evaluate some aspects of the use and 
possible interpretation of the Lee-Thorpe Occupational Interest 
Inventory, Advanced Series in counseling adult males by (1) 
presenting correlations with another commonly used interest 
test, and (2) presenting correlations with a measure of mental 
ability. The Inventory, (hereinafter referred to as OII), 
devised by Edwin A. Lee and Louis P. Thorpe, was published 
by the California Test Bureau in 1943. The first of the two 
sections constituting the inventory consisting of one hundred 
twenty pairs of items, each describing an activity. The two 
hundred forty items which constitute this section are divided 
among six scales labeled ‘fields of interests’: personal-social, 
natural, mechanical, business, the arts, and the sciences. The 
forty items allotted each field are further distributed as follows: 
ten items describing tasks of a low-level, routine nature; twenty 
items of a medium-level, skilled nature; and ten items of a 
high-level, supervisory, administrative or expert nature. The 
authors state in their Manual that this classification by level 
was based primarily on the ratings given in the Dictionary of 
Occupational Titles to the occupations with which each of the 
activities was classified. 

A set of three additional scales consisting of ‘types of interests’ 
(verbal, manipulative, and computational) is obtained by 
rescoring some of the responses made on the one hundred twenty 
pairs of items of the first section. Twenty-two of these items 
are rescored by the verbal scale; twenty-six for the manipulative, 
and nineteen for the computational. 

The subject’s ‘level of interests’ is revealed by a single scale, 
the tenth, computed from his responses to the second section of 
this test, which consists of thirty triads of activities, coded to 
prevent recognition by the subject. The three items in each 
set cover three levels of activities in a given occupational area. 


353 








The Journal of Educational Psychology 


354 


In the field of art, for example, the three activities are concerned 
with copying of sketches, painting of pictures, and teaching of 
art. The subject is here requested to choose among lower level, 
easier work, involving little or no responsibility ; work of moderate 
difficulty, skill, and responsibility; and work on a high level of 
difficulty, responsibility, and abstraction. 


TABLE I.—CoMPARATIVE PERCENTILE ScORES MADE ON OccuPA- 
TIONAL INTEREST INVENTORY ScALES BY Five HuNDRED 
Mate HiGuH-scHoot SENIORS AND NINE HUNDRED 
Firry-FourR MALE VETERANS 











10th Percentile|50th Percentile|/90th Percentile 
Scores Scores Scores 
Scales: , , . 
High High High 
School Vet- School Vet- School Vet- 
Seniors} “"""* | Seniors | “'®"* | Seniors | “"*™* 
Fields of 
Interest 
Personal Social 6 10 14 17 22 25 
Natural 8 7 21 16 33 29 
Mechanical 11 11 22 21 30 30 
Business 6 11 18 20 28 32 
The Arts 6 9 17 18 27 28 
The Sciences 11 15 25 23 33 31 
Types of 
Interest 
Verbal 2 4 7 9 15 16 
Manipulative 8 10 13 13 17 17 
Computational 2 3 8 i) 14 15 
Level of 
Interests 46 56 66 70 77 81 























In spite of the low number of items in each scale, the Manual 
reports a high test-retest reliability. The coefficients of reli- 
ability obtained for the six ‘fields of interests’ ranged from 0.82 
to 0.93, while the range for the three ‘types of interests’ was 0.80 











Certain Aspects of the Lee-Thorpe Inventory 355 


to 0.90, although the latter scales average only twenty-two 
itemseach. The coefficient of reliability for the ‘level of interests’ 
was 0.71. The norms published in the Manual were obtained 
by administering the test to one thousand twelfth-grade students 
in California high schools. The authors state: ‘‘The norms are 
suitable not only for high-school and junior-college students, but 
also the general adult population.” It is difficult to evaluate 
the reliability of this statement on a comparative basis, since 
the authors present only precentile groupings without means 
or standard deviations. Table I presents comparative 10th, 
50th and 90th percentile scores for male high-school students 
published in the OII Manual and for nine hundred fifty-four 
male veterans who were tested under the Veterans Administra- 
tion advisement and guidance program in California, Arizona, 
Nevada and Hawaii. While the authors of the OII do not 
state which score of the range of scores reported in each decile 
group is the exact percentile score, it is assumed that they fol- 
lowed the usual practice of placing the score in question at the 
bottom of the interval, whereupon the scores given for the 
10th percentile would actually consist of those scores which fall 
in the range covered by the 10th and 19th percentiles. 

While the lack of the proper statistical measures prevents the 
computation of the significance of the deviations in Table I, 
inspection indicates that there seems to be a tendency for 
veterans to achieve higher scores on the personal-social, business, 
and level of interest scales, and lower scores on the natural 
scale. 

During the four years the OIT has been available, it has gained 
considerable acceptance on the part of vocational counselors in 
California, as well as elsewhere. The extent of its usage in the 
counseling of veterans in the Veterans Administration Advise- 
ment and Guidance Program in California, Arizona, Nevada and 
Hawaii during a five-week period in the fall of 1946 is presented 
in Table II together with comparative figures for other interest 
tests. The fact that the OII was the second most commonly 
used test of the interest type is not due to any influence on the 
part of the Veterans Administration inasmuch as the Veterans 
Administration does not specifically recommended any one 
test for use by its vocational advisers or by the vocational 
appraisers who, as employees of educational institutions, counsel 








356 The Journal of Educational Psychology 


TaBLE IJ.—NumBers AND Types OF INTEREST TESTS ADMIN- 
ISTERED TO 2,165 VETERANS IN CONNECTION WITH VETERANS 
ADMINISTRATION ADVISEMENT AND GUIDANCE PROGRAM 
IN CALIFORNIA, ARIZONA, NEVADA, AND Hawalil, 
SEPTEMBER 21 TO OcTOBER 26, 1946. 


Number 
Name of Test | Administered Per Cent 

Kuder Preference Record 14,53 61.2 
Occupational Interest Inventory 584 24.6 
Vocational Interest Blank for Men 

(Strong) 117 4.9 
Brainard Occupational Preference In- 

ventory 115 4.8 
Cardall Primary Business 79 3.3 
Cleeton Vocational Interest Inventory 23 1.0 
A Study of Values (Vernon and Allport) 4 0.2 

Total 2,375 100.0 


veterans in guidance centers under contract with the Veterans 
Administration. It must, therefore, be assumed that the extent 
of this test’s usage is due to certain qualities which tend to make 
it a relatively valuable instrument for the counselors. Coun- 
selors who use the OII report that it is an acceptable counseling 
instrument largely because of the possibilities of using test 
items as leads in interviewing. The authors have facilitated 
this practice by listing the various categories of items which make 
up the six ‘fields of interest.’ For example, the personal-social 
‘field of interest’ is made up of the following sub-scales: domestic 
service, personal service, social service, teaching, law and law 
enforcement, health and medical service. 

Perhaps the best use of the OII is as an interviewing aid, inas- 
much as the scores and consequent percentiles obtained on its 
several scales are of limited value in an absolute sense, since, for 
example, due to lack of research no one knows whether artists 
really like to do the tasks which constitute the items which form 
the basis of ‘the arts’ scale of the OII. Hence the test seems 
more useful when used as an aid to interviewing or counseling 
rather than as a screening instrument. 








Certain Aspects of the Lee-Thorpe Inventory 357 


Other limitations of the OII may also be noted: 

a) The cover sheet of the test booklet contains the itemized 
breakdown of each ‘field of interest’ into its component sub- 
units as noted above. With this orientation a poorly motivated 
subject could distort his responses to yield any score he desired. 
This material belongs in the Manual and should not be included 
as part of the test booklet. 

b) Although the reported reliability of the ‘type of interests’ 
is high, the small number of items on which these scales are 
based leads one to question their value. 

c) The lack of research on the OII renders it of less value to the 
counselor than other interest tests concerning which considerable 
research has been published. The fact that the OII has only 
recently appeared may account in part for the lack of published 
research concerning its use. 

d) The OII is time-consuming to score (approximately ten 
minutes) when the inventory booklet is used; when a separate 
answer sheet and scoring templates are used, the OII loses its 
value as an interview aid unless it is used in conjunction with the 
test booklet. However, since this procedure is inconvenient, 
counselors frequently do not use this relatively cumbersome 
technique. 

The research project presented in this paper largely grew out 
of the considerations of (c) above. It was felt further that an 
evaluation of this inventory in terms of scores made on other 
tests would render it more useful to those vocational counselors 
who are using this test or who are considering its use. 

A means of making such an evaluation was suggested by the 
fact that counselors at a Veterans Administration guidance 
center in a large California city occasionally administered both 
the OII and the Kuder Preference Record, (hereinafter referred 
to as KPR), using each as a check on the other. Counselors in 
this guidance center also frequently requested the administration 
of the Gamma Test; Forms A and B of the Otis Quick-Scoring 
Test of Mental Ability (hereinafter referred to as Otis Gamma). 

In compiling data for this study, every other case was selected 
in which the KPR, the OII, and the Otis Gamma test had been 
administered until fifty had been accumulated. A Pearson 
product-moment correlation was then computed between the 
raw scores each subject had made on the ten scales of the OIT 








358 The Journal of Educational Psychology 


and raw scores made on the Otis Gamma test. These coefficients 
of correlation and their standard errors are presented in Table III. 


TaBLE IJJ.—CorrELATION OF RAw Scores MADE By FiFty 
MALE VETERANS ON SCALES OF THE OCCUPATIONAL 
INTEREST INVENTORY WITH RAw ScoRES OF 
Otis GAMMA TEST 


Occupational Interest Inventory Scales r oT 
Personal-Social — .02 .14 
Natural — .01 .14 
Mechanical — .18 .14 
Business —.10 .14 
The Arts .16 .14 
The Sciences .10 .14 
Verbal — .01 .14 
Manipulative — .04 14 
Computational .00 .14 
Level of Interests ol .13 


Except for the scores made on the ‘level of interest,’ the cor- 
relations presented in Table III indicate that whatever is meas- 
ured by the OII appears to be independent of mental ability as 
measured by the Otis Gamma test. The correlation obtained 
between the ‘level of interest’ scale and the Otis Gamma test 
seems to be an exception, in that it approaches a significantly 
positive figure. However, a relationship between this interest 
scale and mental ability would be expected on a logical basis 
because of the nature of the items involved. It is to be noted 
that low correlations between interest scale scores and mental 
ability were similarly discovered in the case of the KPR.!? 

The interpretation made in the Revised Manual for the Kuder 
Preference Record? to the effect that “in guidance, use of the 





1 Adkins, D. C., and Kuder, G. F., ‘‘The Relation of Primary Mental 
Abilities to Activity Preferences.’”’ Psychometrika (1940), 251-62. 

* Triggs, F. O., ‘‘A Study of the Relationship of Kuder Preference Record 
Scores to Various Other Measures.” Educational and Psychological Meas- 


urement, (1943), 341-54. 
’ Chicago, Science Research Associates, 1946. 














Certain Aspects of the Lee-Thorpe Inventory 359 


preference profile should always include a consideration of 
mental ability,’”’ would apply also to the use of the OII. 

Proceeding on a logical basis, scales of the OII and certain 
scales of the KPR were selected as being the most closely allied. 
The correlations between these scales are presented in Table IV. 
The pairing of these scales was based on the description of the 
interests measured by the ‘fields’ and ‘types of interest’ pre- 
sented in the Manual of the OII and the ‘Classification of Occu- 
pations According to Major Interest,’’ presented in Table I of 
the 1946 revision of the KPR Manual.‘ 


TABLE IV.—CoRRELATION BETWEEN Raw Scores MADE BY 
Firry MALE VETERANS ON SCALES OF THE OCCUPATIONAL 
INTEREST INVENTORY AND THE KUDER PREFERENCE 


RECORD 

Occupational Interest Kuder Preference 

Inventory Scales Record Scales r or 
Personal-Social Social Service .60 .09 
Natural Mechanical .33 13 
Mechanical Mechanical .72 .07 
Business Clerical .74 .06 
Business Persuasive .52 .10 
The Arts Musical .42 .12 
The Arts Literary 34 .12 
The Arts Artistic oF .12 
The Sciences Scientific .80 .05 
Verbal Literary .16 .14 
Manipulative Mechanical .10 .14 
Computational Computational .50 ll 
Level of Interest Persuasive .18 .14 


The correlations presented in Table IV indicate in general 
that the logical assumption of similarity between interest patterns 
as measured by the OII and the KPR is to some extent justifiable. 
The correlations reported in Table IV are comparable to those 
reported by other studies of similar scales of various interest 
tests, in that in most cases they confirm relationships the exist- 





* Ibid. 








360 The Journal of Educational Psychology 


ence of which might be presumed on a logical or inspectional 
basis. 5:67 

Those correlations in Table IV which are low may be attributed 
to the probability that one scale may involve only part of what 
purports to be measured by the other. For example, the tasks 
involved in the ‘nature’ field of interest in the OII obviously 
constitute only a portion of those which make up the ‘mechan- 
ical’ scale on the KPR. The same situation exists in the cases 
of the ‘business’ and the ‘persuasive’ scales, ‘the arts’ and the 
‘musical’ scales, ‘the arts’ and the ‘literary’ scales, and ‘the 
arts’ and the ‘artistic’ scales. The correlation between the ‘level 
of interest’ scale on the OII and the ‘persuasive’ scale on the 
KPR, between the ‘verbal’ scale and the ‘literary,’ and between 
the ‘manipulative’ scale on the OII and the ‘mechanical’ on the 
KPR tend to be lower than might be expected. 

However, studies made by Triggs indicate that these scales 
on the KPR are fairly valid in terms of scores made on similar 
scales of other interest tests, which would raise further question 
as to the validity of the ‘manipulative’ and ‘verbal’ scales of the 
OII.? The ‘level of interests’ scale on the OII probably measures 
phases of interest apart from those which are measured by the 
KPR ‘persuasive’ scale. 

An additional coefficient of correlation was computed on the 
scores made on the Otis Gamma test and the ‘mechanical’ scale 
of the KPR in order to test the hypothesis of this writer to the 
effect that this scale of the KPR tended to measure a more 
professional or intellectual type of mechanical interest than that 
measured by the ‘mechanical’ scale of the OII. This hypothesis 
was not substantiated by the coefficient correlation of —0.32 
(c, = 0.13), as compared with a correlation of —0.18(¢, = 0.14) 
between the ‘mechanical’ scale of the OII and the Otis Gamma 


test. 


5’ Feder, D. D.; Triggs, F. O.; Wittenborn, J. R., ““A Comparison of 
Interest Measurement by the Kuder Preference Record and the Strong 
Vocational Interest Blanks for Men and Women.” Educational and 
Psychological Measurement, (1942), pp. 239-57. 

S op. cit. 

’ Triggs, F. O., ‘‘A Further Comparison of Interest Measurement by the 
Kuder Preference Record and the Strong Vocational Interest Blank for 
Men,” Journal of Educational Research, (1944), pp. 538-44. 

8 op. cit. 














Certain Aspects of the Lee-Thorpe Inventory 361 


Inasmuch as the published norms for the OII do not contain 
such standard statistics as means, medians, standard deviations 
or quartile deviations, comparative research on this test is 
virtually precluded. Table V lists these statistical measures for 
scores made on the OII by nine hundred fifty-four male veterans, 
in order that comparative studies may be made using similar 
data from other populations. 


TABLE V.—MeEp1Ans, MEANS, STANDARD DEVIATIONS AND QUAR- 
TILE DEVIATIONS OF ScORES OBTAINED BY NINE HUNDRED 
Firry-FouR VETERANS ON VARIOUS SCALES OF THE 
OccUPATIONAL INTEREST INVENTORY, ADVANCED 


SERIES 
Scale Mediam Mean ri Q 
Personal-Social 16.94 17.25 6.21 4.40 
Natural 16.71 17.41 8.28 5.86 
Mechanical 21.15 21.05 6.87 5.19 
Business 20.57 21.15 8.01 6.10 
The Arts 18.35 19.69 7.10 4.67 
The Sciences 23. 64 23.38 6.05 4.98 
Verbal 10.08 10.46 4.72 4.27 
Manipulative 13.77 12.66 2.68 1.86 
Computational 8.99 9.30 4.40 3.39 
Level of Interests 70.75 69.63 9.87 7.05 


SUMMARY AND CONCLUSIONS 


1) Because adequate statistical measures are not available, it 
is difficult if not impossible to ascertain whether the percentile 
norms presented by the authors of the Occupational Interest 
Inventory are applicable to the general population as they state. 

2) There appears to be no correlation between scores obtained 
on the various scales of Occupational Interest Inventory, Advanced 
Series, and the scores obtained on the Gamma Test, Otis Quick- 
Scoring Tests of Mental Ability, with the possible exception of the 
‘level of interests’ scale which approaches significant, positive 
correlation with the Otis Gamma test. 

3) In general, positive correlation appears to exist between 
scores made by male veterans on scales of the Occupational 
Interest Inventory, Advanced Series, and scores made on compar- 








362 The Journal of Educational Psychology 


able scales of the Kuder Preference Record. This correlation is 
not high enough to warrant substitution of one test for the other, 
nor is this correlation high enough to warrant identical interpre- 
tation of scores made on two similarly-named scales. In general, 
high correlations appear to exist between those scales which 
one would logically suppose to be highly correlated and low 
correlations appear to exist between those scales which one 
would assume to be partially correlated. 

4) No evidence was obtained to substantiate the assumption 
that the Kuder Preference Record mechanical scale measures a 
more professional or intellectual type of interest than does the OII 
mechanical scale. 

5) Before the Occupational Interest Inventory can achieve the 
professional acceptance accorded other interest tests further 
studies are indicated, especially the validation of its scales 
on the basis of responses made by persons employed in various 
occupations. 








BASAL METABOLISM 
AND ACADEMIC PERFORMANCE 
IN A SAMPLE OF COLLEGE WOMEN* 


HAROLD GRIER McCURDY 
Meredith College 


Surprisingly little attention has been paid to the possibility 
of a relationship between basal metabolism and scholastic success. 
An article by Witty and Schacter* on hypothyroidism, mainly 
suggestive, and one by Patrick and Rowles,® reporting an 
insignificant correlation coefficient of .05 between BMR and 
point-hour ratio in the case of fifty-two college women, seem 
to comprise the total published literature. Asa kind of footnote, 
mention might also be made of the work of Benedict and others 
showing a slight rise of metabolic rate as a consequence of 
intellectual effort.’ 

More numerous are the studies exploring the relation between 
basal metabolism and intelligence. In the case of basal metab- 
olisms lying within the normal range, the trend of the evidence is 
strongly against the supposition of a significant relationship.*® 
A recent article by Gaskill and Fritz,* for example, reporting 
on a study of over six hundred college students of both sexes, 
concludes that basal metabolism, expressed in three ways, 
has no bearing on intelligence as measured by two college aptitude 
examinations. Earlier studies by Shock and Jones,’ by Patrick 
and Rowles,* and by others, using a variety of subjects and tests, 
come to the same conclusion. The work of Hinton,‘ who 
reports finding very high positive correlations for children, 
stands as a startling and lonely exception. 

The study to be reported here bears out the general trend of 
conclusions in regard to intelligence, but offers support for the , 
hypothesis that some relationship exists, in the case of normal | 
subjects, between basal metabolism and scholastic achievement. 


SUBJECTS AND COLLECTION OF DATA 


The subjects of the present study were thirty college women 
who in the fall semester of 1946 were enrolled in the writer’s 





* The writer wishes to express his appreciation for the assistance of Imo- 
gene Grainger, Helen Norville, Jean Griffith, and Shirley Powell. 
363 








364 The Journal of Educational Psychology 


classes in introductory psychology. They were predominantly 
sophomores, but two were classified as freshmen (because of 
point-hour standing), five as juniors, and three as seniors. 
Selection was on the basis of grades made in the psychology 
course, which were calculated from six objective tests given 
at intervals throughout the semester, and the aim was to have 
a sample representative of the various levels of performance in 
that course. Failure to secure the codperation of some of those 
selected for the basal metabolism tests introduced chance 
irregularities in the planned distribution, and somewhat reduced 
the total number expected; but the representative sample aimed 
at was approximately achieved. As indicated in Table Ia, 
the psychology grades ranged from 65 to 90, distributed in step 


TaBLE la.—SuMMARY OF MEASUREMENTS, THIRTY COLLEGE 


WoMEN - 
Range Mean SD 
(1) Psychology Grades 65 to 90 76.57 6.82 
(2) Otis Scores 38 to 71 51.20 8.08 
(3) BMR —22 to +6 —7.70 7.15 
(4) Age 17 to 22 19.53 .98 


intervals of two as follows, beginning with interval 64-65: 1, 3, 
2, 2, 0, 6, 2, 5, 2, 2, 1, 2,1, 1. An incidental consideration in the 
selection of the subjects was that they were among those enrolled 
in the 1947 spring semester course in general experimental 
psychology at the time that the metabolism testing was under- 
taken. This fact excluded from the selected group a few indi- 
viduals who had made lower grades in the fall semester and had 
thus not been permitted to continue in psychology in the spring. 

The Otis scores summarized in Table Ia were secured during 
the fall semester. Forms A and B of the Otis Self-Administering 
Test of Mental Ability, Higher Examination, were used, one 
form in October and the other in December, about two months 
apart. The time limit in each case was thirty minutes. Four 
points were deducted from the scores made at the second testing, 
as recommended by Otis, and the two scores were averaged. 
A check on the scores obtained on the two occasions showed an 
average rise of about four points, as Otis predicts, so that the 
deduction of points was apparently well advised. 

The BMR scores summarized in Table Ia were secured in 








Basal Metabolism and Academic Performance 365 


March, April, and May of 1947. A new Jones Motor-Basal 
Metabolism Apparatus, which measures the rate of consumption 
of an exact liter of oxygen, was used as the testing instrument. 
Determinations were made by the writer and student assistants 
who had received some previous training with the apparatus. 
In the case of this instrument, a check on the adequacy of the 
technique is afforded by the nature of the record, so that leaks, 
marked changes in the breathing, changes in the rate of oxygen 


TABLE Ib.—CoEFFICIENTS OF CORRELATION, * WITH PSYCHOLOGY 
GRADES AS THE CRITERION 


T12 = 57 T13 = 43 T14 = 19 

riz.3 = .60 ris2 = .48 Tis = —.03 

T12.34 = .63 T13.24 = 48 14.23 = —.14 
R423) = .69 Ri (234) = 12 


* P values of correlation coefficients in the first two columns are equal to 
or less than two per cent. 


consumption during the experimental period, etc., can be 
detected and faulty determinations eliminated. The writer 
took due account of all such irregularities. In scoring the 
oxygen consumption records, use was made of the slide-rule 
provided with the apparatus. This device expresses the indi- 
vidual BMR as a plus or minus or zero deviation from the 
norms for the given sex, age, height, and weight. The ages, 
heights, and weights used in these estimations were those 
reported by the subjects, who had during the year been meas- 
ured for height and weight in the nude by the college health 
department. 

The BMR subjects came to the laboratory early in the morning, 
after abstaining from food for ten or more hours, rested on a cot 
for twenty to thirty minutes, and then were given two tests in 
succession. Of these two tests, the one showing the lower rate of 
oxygen consumption was the basis for the scores represented in 
Table Ia; usually, but not always, this was the second of the two. 
In seven instances, determinations were made also two or three 
weeks later, but were not used in this study because it was not 
possible to do the same for all subjects; these later readings were 
fairly similar to the first ones, though they tended to be slightly 
lower. It hardly needs to be added that the subjects were tested 
one at a time, and were shielded from outside disturbances as 








366 The Journal of Educational Psychology 


far as possible. The BMR’s thus obtained, while certainly far 
from errorless or absolutely basal, may be taken as approxima- 
tions of the sort to be expected in routine clinical work. 

The age characteristics of the group, as given in Table Ia, are 
based on the reports of the subjects at the time of the metabolism 
testing. 

Table IIa, which summarizes data for twenty-eight of the 
thirty subjects, includes as a new item the quality point-semester 
hour ratio based on the over-all academic records of the subjects 
for the year and a half preceding the spring semester of 1947. 
Two of the subjects represented in the first Table had to be 


TABLE IIa—SummMary OF MEASUREMENTS, TWENTY-EIGHT 
CoLLEGE WOMEN 


Range Mean SD 
(1) Point-Hour Ratio .3 to 2.4 1.19 .55 
(2) Otis Scores 38 to 62 49 .97 6.86 
(3) BMR —22 to +6 —7.79 7.30 
(4) Age 17 to 22 19.46 91 


omitted from this one because their records did not extend back 
unbrokenly for a year and a half at this college. The two 
omitted subjects, aged twenty-one and nineteen, had Otis scores 
of 71 and 66, psychology grades of 90 and 87, and BMR’s of 
—1l and —11, respectively. Their omission nips off the top 
of the Otis distribution. 


STATISTICAL ANALYSIS 


In Tables Ia and Ib, where all thirty subjects are dealt with, 
attention is focussed on the relation of psychology grades as the 
criterion of academic performance with the other three measures. 
The expected order of relationship exists with Otis scores. 
Relationship with age is insignificant. The immediately inter- 
esting fact, however, is the significant relationship with BMR, 
which continues to stand up satisfactorily when Otis scores 
and age are partialled out. The first multiple R indicates that 
forty-eight per cent of the variance in the psychology grades 
is related to Otis scores and BMR combined. Inclusion of age 
raises this figure to fifty-two per cent. 

Turning to Tables IIa and IIb, where the criterion of academic 
performance is point-hour ratio, we see that the total coefficients 








Basal Metabolism and Academic Performance 367 


of correlation are significant for all three variables. The coeffi- 
cient for BMR is somewhat reduced when both Otis scores and 
age are partialled out, but still remains at better than the two 
per cent level of significance? (p. 202). The coefficient for age, on 
the other hand, is reduced considerably below the ten per cent 
level by the partialling out of both Otis scores and BMR. The 


TABLE IIb.—CoEFFICIENTS OF CORRELATION, * WITH POINT-HOUR 
RATIO AS THE CRITERION 


Ti2 = .46 T13 = .53 T14 = 42 

Ti2.3 = 52 T13.2 >= .59 T1142 >= 48 

Ti2.s4 = .54 13.204 = .47 Tis.23 = .28 
Ris) = .71 R234) = .72 


* P values of correlation coefficients in the first two columns are less than 
two per cent; the first two figures in the third column are also significant at 
about the same level. 


multiple R’s resemble those in Table Ib. With either criterion | 
of academic performance, then, the scheme of relationships 
turns out to be about the same, the relation of BMR to the 
criterion remaining significant throughout, while that of age in 
the second instance proves to be somewhat superficial. 

Table III displays some other relations among the non-criterion 
variables. The insignificant relationship between Otis scores 
and BMR is in harmony with the major portion of the literature 


TABLE ITI.—INTERRELATIONS OF (2) OT1s Scorgs, (3) BMR, anp 
(4) Ace or Turrty COLLEGE WoMEN 
T23 .06 iy id 15 34 = A9* 
T2234 = — .02 T24.3 = 14 T34.2 => .49* 
* P value is less than 1 %. 


dealing with the physiology of intelligence; the slight relationship 
with age is, again, not of a significant order. The relationship 
between age and BMR, however, though not expected, is clearly 
significant for this sample, with a P value of less than one per 
cent. That this relationship exists largely apart from the 
relations of these two variables to the criteria of academic per- 
formance is brought out by the preceding tables. 











368 The Journal of Educational Psychology 


DISCUSSION 


The sample used in this study is small. To what extent is it 
representative of the particular college population from which it 
was drawn? On the question of the total year by year popula- 
tion of the college, the writer can only offer an opinion; but since 
the opinion is based on six years’ acquaintance with the popula- 
tion, he thinks it something better than a guess that at least 
with respect to psychology grades, Otis scores, and point-hour 
ratio the experimental sample is not at all anomalous. With 
respect to its conformity to the subgroup from which it was 
directly taken, it is possible to be more precise. The mean Otis 
score for this subgroup of seventy-nine students was 50.35; the 
mean psychology grade, 76.05. The respective standard devia- 
tions were 7.49 and 5.98. A further indication of similarity 
is that the correlation coefficient for Otis scores and psychology 
grades in the group of seventy-nine was .56. Comparison of 
these figures with those given in the tables for the experimental 
sample will show how close the resemblance is in these respects. 
Whether the sample is equally representative with respect to 
BMR is a matter of inference, since there are no data available 
on that point, but there is no particular reason to suppose 
otherwise. 

On the larger question of the representativeness of the sample 
for college women in general, no confident statement can be made. 
There is reason to believe that this sample and the particular 
population it represents do not conform to the central tendency in 
every respect, but it would not be polite to discuss the evidence, 
nor very profitable. More can be gained by comparing the data 
of the present study with the only similar one known to the 
writer. 

Table IVa presents some pertinent facts on the fifty-two college 
women from the home economics department of Ohio University 
studied by Patrick and Rowles.5 Table IVb analyzes the differ- 
ences between that group and the writer’s in terms of the stand- 
arderror. It will be seen that the Ohio sample differs significantly 
from the writer’s in having a higher mean point-hour ratio, a 
higher mean age, and a greater age variability. The difference 
between the BMR means does not reach the five per cent level 
of significance? (p. 75). Probably the writer’s sample falls 








Basal Metabolism and Academic Performance 369 


below the Ohio group in mean intelligence, but the two scales 
used cannot be directly compared. 

The correlation coefficients reported by Patrick and Rowles 
which are useful in the present discussion are: between point-hour 
ratio and intelligence, .48; between point-hour ratio and BMR, 
.05; between point-hour ratio and age, .21; between age and 


TABLE 1Va.—SuMMARY OF MEASUREMENTS, Firry-Two COLLEGE 
WoMEN STuDIED BY Patrick & ROWLES 


Mean SD 
(1) Point-Hour Ratio 1.50 51 
(2) BMR —7.3 8.97 
(3) Age 21.1 2.6 


BMR, .00; between intelligence and BMR,°.01. These results 
differ from the writer’s mainly in the low or zero correlation of 
BMR with point-hour ratio and with age. 

What can be said about these disparities of correlation? It 
could be argued in regard to the low correlation of BMR with 
point-hour ratio in the Ohio group that these home economics 
students were subjected to less severe grading than the students 
in the writer’s sample, as indicated by the higher point-hour ratio 


TABLE I1Vb.—DIFFERENCES IN TERMS OF THE STANDARD ERROR 
BETWEEN CORRESPONDING MEASURES IN TABLES [Va AND Ila 


Means SD’s 
(1) Point-Hour Ratio 2.46* 45 
(2) BMR 1.78 1.27 
(3) Age 4.10* 5.98* 


* Significant to the two per cent level or better. 


mean, so that in their case differences in academic performance 
which might have been introduced by differences in basal 
metabolism could not so readily appear. It could be argued 
that, as regards the absence of correlation with age, the Ohio 
sample, being older, and apparently having settled definitely 
on a major, was again somehow less affected by the sort of 
academic competition into which the BMR might significantly 








370 The Journal of Educational Psychology 


enter. In the writer’s sample, for instance, it is possible that the 
older students were a better adjusted group than the younger 
ones, more certain of their status, more successful in extra- 
curricular ways, and that this superior emotional and social 
adjustment was reflected in a higher BMR. Such ad hoc argu- 
ments, however, are merely suggestive, and of very doubtful 
validity. The important possibility that differences in the 
determination of the BMR in the two experiments might account 
for the different results is once again a matter that cannot be 
satisfactorily discussed at this juncture. 

If we assume, in spite of the somewhat divergent character- 
istics of the two samples, that they were really drawn from the 
same fundamental population, it may be enlightening to com- 
bine some of the correlation coefficients with the aid of a z 
transformation? (p. 197). By this manner of combination, 
which takes account of the numerical weight of each sample, the 
49 correlation between age and BMR found in the writer’s 
sample drops to .19 for the two samples combined, and thus 
fails of significance at the five per cent level. The combined 
coefficient of correlation for BMR and point-hour ratio, however, 
stands at .24, which, for the number of observations involved, 
does meet the formal five per cent level of significance require- 
ment. The conclusion continues to be justified, then, that a 
real relationship exists between BMR and academic performance. 

It seems ordinary common sense that sustained school work 
should be affected in some degree by the energy level of the 
student. Alertness in class and continual study require some- 
thing more than abstract intelligence, and it is surely plausible 
that the rate of metabolism habitual with the individual is a 
part of the determining conditions. In the paper by Witty and 
Schacter,’ cited approvingly by Woodworth and Marquis,® this 
common-sense view is supported by clinical evidence, including 
the description of a retarded schoolboy of normal intelligence 
whose school performance appeared to benefit from thyroid 
treatment for his low, but not especially abnormal, basal metabo- 
lism. While the correction of a low metabolism cannot by any 
means be always a simple matter, to be solved by a pill or an 
injection, nor be always medically indicated, it would seem that 
in this direction there lies a possibility of practical help in some 
cases of inadequate school progress. We need, however, to carry 








Basal Metabolism and Academic Performance 371 


the investigation of these plausibilities a good deal further before 
we can arrive at solid results. 


SUMMARY 


The results of the present investigation of a sample of women 
in a southern college support the hypothesis that basal metabolism 
is related to academic performance, interacting with intelligence 
as measured by the Otis test in this instance to account for 
around fifty per cent of the variance in grades. The sample in 
question appears to be representative of the college population 
from which it was immediately drawn, though in certain respects 
it perhaps departs from the central tendency of the national 
population of college women at large. Comparison of the data 
and correlations with those reported in the study by Patrick 
and Rowles® reveals some differences between the experimental 
samples, but does not negate the conclusion that basal metabo- 
lism should be considered along with intelligence as a probable 
determinant of the level of school work. This conclusion is in 
harmony with common-sense views and with the clinical evidence 
reported by Witty and Schacter.’ Further investigation should 
prove fruitful. 


REFERENCES 


1) DuBois, E. F. Basal metabolism in health and disease. 
Philadelphia: Lea & Febiger, 1936. 

2) Fisher, R. A. Statistical methods fer research workers. 
Edinburgh: Oliver & Boyd, 1941. | 

3) Gaskill, H. V., and Fritz, M. F. ‘Basal metabolism and 
the college freshman psychological test.” J. gen. Psychol., 
1946, 38, 29-45. (Abstract seen.) 

4) Hinton, R. T., Jr. “‘A further study on the réle of the 
basal metabolic rate in the intelligence of children.” J. educ. 
Psychol., 1939, 30, 309-314. 

5) Patrick, J. R., and Rowles, E. ‘‘Intercorrelations among 
metabolic rate, vital capacity, blood pressure, intelligence, 
scholarship, personality and other measures on university 
women.” J. appl. Psychol., 1933, 17, 507-521. 

6) Shock, N. W. ‘Physiological factors in behavior.” In: 
Hunt, J. McV., Personality and the behavior disorders. New 
York: Ronald Press, 1944. Vol. I, 582-618. 








372 The Journal of Educational Psychology 


7) Shock, N. W., and Jones, H. E. ‘The relationship 
between basal physiological functions and intelligence in adoles- 
cents.” J. educ. Psychol., 1940, 31, 369-375. 

8) Witty, P. A., and Schacter, H. 8. ‘Hypothyroidism 
as a factor in maladjustment.” J. Psychol., 1936, 2, 377-392. 

9) Woodworth, R. S., and Marquis, D. G. Psychology. 
New York: Henry Holt & Co., 1947. 








THE EFFECT OF DISTRACTIONS 
ON TEST RESULTS 


DONALD E. SUPER, WILLIAM F. BRAASCH, JR., 
AnD JOSEPH B. SHAY 


Teachers College, Columbia University 


Current practices in test administration specify that the place 
of testing be free from distractions. For example, Bingham! 
states that the examiner ‘will secure suitable quarters, free 
from disturbances and interruptions” but does not define a 
‘disturbance’ nor an ‘interruption.’ In practice psychometrists 
realize that the complete elimination of stimuli is impossible to 
attain and so direct their efforts to reducing distractions to a 
minimum. That certain disturbances might have a favorable 
effect on the test situation has been pointed out by Terman 
and Merrill. According to their observations, familiar sounds 
“are reassuring to a child who is inclined to be a bit timid.” 
The authors also report that in their experience excellent testing 
may be done “‘under very inadequate physical conditions.”’ 
Apparently the effect of specified distractions on test results 
needs to be more adequately determined, with the distractions 
described in sufficient detail for their nature to be clear and 
with an experimental design which permits the drawing of 
verifiable conclusions. 

The purpose of this experiment was, therefore, to study the 
effect of certain commonly encountered distractions on test 
results. On the basis of clinical observations the hypothesis 
was established that group test scores would not be appreciably 
affected by commonly occuring distractions. 


PROCEDURE 


The tests used were the Minnesota Vocational Test for Clerical 
Workers and the Otis Quick-Scoring Mental Ability Test, Gamma 
Am. The directions as given in the manual were used with the 
following changes: on the clerical test, Part II, Name Comparison 
was given first, and on the Otis the time limit was reduced to 
twenty minutes. This latter change meant that an IQ could 
not be calculated, but the nature of the experiment required 
only that the raw score be obtained. 

The subjects were two groups of graduate students, ages 
twenty-two to thirty-eight, taking a course in testing. There 

373 








374 The Journal of Educational Psychology 


were thirty in the distracted group and twenty-six in the control 
group. The division of the class into sections had been made 
earlier on a random basis for instructional purposes. The Names 
part of the clerical test was used to determine the equality of 
the groups as it is a measure of the two abilities measured by the 
other tests (general intelligence and speed of discrimination) ; no 
distractions were planned during its administration. 

Since the tests were administered during the regular laboratory 
period the impression given by the examiner was that they were a 
demonstration of group test administration and that the results 
were to be analyzed to ascertain the effect of administering 
the Names Test before rather than after the Numbers Test. 

The instructions for the distractions during the various tests 
were as follows: 


‘1. Minnesota Clerical Names Test 
‘None 


‘2. Minnesota Clerical Numbers Test 

‘‘a) At the end of the second minute of testing the trumpeter 
will play the scale up, then down, will pause thirty seconds, then 
play the scale back up. The trumpeter will be in the next room 
and stand facing the closed connecting door. 

“‘b) At the end of the fourth minute Mr. Hummel will burst 
into the room, stop short, look around, tiptoe to the examiner in 
an exaggeratedly quiet manner, whisper hoarsely ‘How long 
are you going to be here?’ and then exit on tiptoe, leaving the 
door slightly ajar. 


“3. Otis Test of Mental Ability 

‘‘a) While marking the answer to the third question, Miss 
Furstman will break her pencil point with a loud snap. She 
will then make a mild exclamation as she drops the pencil, 
slide the chair back with a scraping noise, get up and walk with 
ostentatious care to the examiner for another pencil. 

“b) At the end of the fourth minute Mr. Hummel and Mr. 
Appel will walk down the stairs from the fourth floor, arguing 
loudly on Schwellenbach’s suggested ban of the Communist 
Party. The discussion near the door should last for about one 
minute. The examiner will have placed himself on the far side 
of the room so as to arrive at the door at about the time the two 


men are ready to move on. 











The Effect of Distractions on Test Results 375 


‘‘c) At the end of ten minutes the trumpeter will play six bars 
of ‘Home Sweet Home,’ falter, recover, and go on to finish 
the melody. The trumpeter will give the impression that the 
melody is being played by a novice. The location will be the 
same as described in 2a) above. 

‘“‘d) At the beginning of the test, the examiner will set the 
timer to ring at fifteen instead of twenty minutes. When the 
bell rings the examiner will pick up the timer, look at it, look at 
his stopwatch, and announce, ‘Go on with the test.’’’ 

The inclusion of musical distractions was not incongruent since 
the music department uses nearby rooms for practice. 

None of the distractions went unnoticed during the test period. 
Une minute after the Minnesota Numbers Test started, the 
playing of a piano in the room below could be heard. This did 
not attract as much attention as the trumpet which caused a 
number of students to look up, some to snicker, and one to 
inquire facetiously, ‘‘Was that intentional?” The examiner 
shook his head. 

The entrance of the examiner’s ‘friend’ caused a few of the 
students to look up. 

The pencil-breaking incident at the beginning of the Otis 
caused some murmuring among the students in the immediate 
vicinity. 

The argument in the hall was aided by passing students who 
joined in the discussion, thinking it genuine, and added to the 
commotion in a realistic way. Several of the subjects looked up 
and one or two looked out of the door at the disputants. 

The trumpeter’s second effort resulted in many of the students 
looking up and a good deal of snickering. 

A general reaction of annoyance at the mis-timing of the Otis 
was expressed by unintelligible muttering. 

After the test a few comments on the quality of the trumpeting 
and the fact that the examinees had been ‘cheated out of two 
seconds’ were made. A discussion of the experiment the follow- 
ing week confirmed the belief that none of the class was aware of 
the real intent of the experiment. 


RESULTS 


The tests were scored according to standard procedures and the 
results were tabulated by age and sex for each group. No differ- 





wt 
a 
a 
. 
Me 
‘ 





376 The Journal of Educational Psychology 


ences approaching statistical significance were found for either 
age or sex, so combined data were used for comparisons. 

Table I gives the means, standard deviations, and critical 
ratios for each of the tests. None of the differences were sta- 
tistically significant even at the ten per cent level. 


TABLE I 

Names Mean SD C/R 
Experimental Group 148.2 24.54 1.6 
Control Group 135.5 34.86 — 

Numbers 
Experimental Group 132.7 30.62 0.676 
Control Group 138.1 27.68 — 
Otis 

Experimental Group 55.4 8.43 0.75 
Control Group 52.1 9.29 —~ 


The greatest mean difference obtained was for the Minnesota 
Names Test which was administered without distractions. The 
difference of 12.7 points is not statistically significant, since a 
critical ratio larger than that of 1.6 could be expected in ten cases 
out of one hundred. Critical ratios larger than .676 and .75 can 
be expected in more than forty cases out of one hundred. 

The conclusion reached is that the distractions were not 
sufficiently disturbing to affect the performance of the group. 


SUMMARY 


This experiment was designed to determine whether or not 
some of the more commonly occurring distractions have an effect 
on group test results. 

Two groups of graduate students were given the Minnesota 
Vocational Test for Clerical Workers and the Otis Test for Mental 
Ability. Several commonly occurring distractions were staged 
for one group during the Numbers part of the Clerical Test and 
during the Otis Test. None of the subjects was aware of the 
nature of the experiment. 

No statistically significant differences were found. The con- 
clusion was drawn that commonly occurring distractions do not 


affect test results. 














The Effect of Distractions on Test Results 377 


REFERENCES 


1) Bingham, W. V. Aplitudes and Aptitude Testing, New 
York, Harper, 1937. 

2) Ligon, E. M. ‘‘Administration of Group Tests,” Educ. 
Psychol. Meas., 1942, 2, pp. 387-400. 

3) Terman, L. M. and Merrill, M. A. Measuring Intelligence, 
Boston, Houghton Mifflin Co., 1937. 











BOOK REVIEWS 


F. BauMGARTEN. Die Psychologie der Menschenbehandlung im 
Betriebe. Zurich: Heft 4 der Schriften zur Psychologie 
der Berufe und der Arbeitswissenschaft, 1946. 


The American psychologist in reading Dr. Baumgarten’s book 
will gain little but the realization that what she teaches her 
students at the University of Bern is antiquated when compared 
with what our students learn in courses on human relations in 
industry. Moreover, he may feel justified pride that American 
industrial psychology is far superior to comparable European 
efforts, assuming that this book represents a fair sample of 
European industrial psychology. Even the title reveals a 
difference in view points if it is translated literally. In the 
Psychological Abstracts (xx1, 179) the German title is rendered as 
“The psychology of human relations in industry.”” A verbatim 
translation of the title reads: ‘‘The psychology of the treatment 
of man in shops’—a more adequate description of the book’s 
content. It also reveals the difference between the author’s 
viewpoint and modern American approaches to this problem. 

A cursory inspection of the index emphasizes that the author’s 
basic attitudes are alien to the thinking of modern American 
industrial psychologists. For instance, of the book’s three 
hundred odd pages, the longest section (eighty-three pages) is 
devoted to “‘critical moments of the treatment of men.’”’ The 
author analyzes four such critical moments. The first thirty 
four pages are devoted to a discussion of the commanding (‘Das 
Befehlen’) of employees by their superiors. The first of this 
chapter’s subtitles reads, ‘Disobediance.’ The largest sub- 
division of this chapter (sixteen pages) is titled ‘‘the technique 
of commanding.” This chapter is followed by one (eight pages) 
on supervision, and by another (twenty-eight pages) on repri- 
manding. Logically the fourth and last subdivision (nine 
pages) deals with punishments. Compared with a total of 
eighty-three pages devoted to such problems as commanding 
and reprimanding, only twenty-two pages are devoted to dis- 
cussing incentives to work. Similarly, chapter five, devoted 
to the employee, consists of only six pages out of a total of over 
three hundred. 

This enumeration creates an impression which is not rectified 

378 








Book Reviews 379 


by careful reading. Although the author quotes from the 
experience of big industrial plants, the predominant viewpoint 
is that of the small shop owner who may employ twenty or thirty 
workers. Tacitly it is assumed that the ‘boss’ knows every 
employee and is ready to fire him at a moment’s notice if he 
should incur disfavor. The frame of reference is throughout 
that of a paternalistically oriented small manufacturing plant. 
A great deal of space is devoted to disapproval of the master- 
servant psychology, but, nevertheless, it is predominant all 
through the treatise. 

On various occasions the author objects to a viewpoint which 
bases evaluation of employer and employee on single traits. 
She insists that the individual is more than the sum of a few 
traits for which employers look when selecting their employees. 
Still, the psychology applied in the book is basically a trait 
psychology. On page 215 the author presents a list of fifty 
qualities which, according to the literature, supposedly make 
for a good employer. Examples of these traits are numbers 28, 
‘iron self discipline,” and 31 “energetic and strict attitude.” 
Baumgarten objects to such enumeration of traits, but then 
turns around to enumerate for pages those traits which make for 
a good ‘boss.’ 

The backwardness in social psychological attitudes which this 
book reveals can be understood only on the basis of an economic 
and social situation very different from ours. Baumgarten 
writes about settings in which the employee is continuously 
afraid of losing his job, whereas the ‘boss’ is free to fire whomever 
he wishes, because he has an unexhaustible pool of unemployed 
labor from which to choose. Nowhere does the book reveal 
the attitude that the art of the employer is to accomplish the 
best with the employees he employs at the moment. The author 
is to some degree aware of the backwardness of European indus- 
trial relations. On page 222 she says that ‘in European indus- 
trial plants only now one finds the beginning of some social 
democratization.” 

A few quotations may illustrate this: the author thinks that 
there should always be a marked social distance between superior 
and inferior; although it might occasionally be narrowed down, 
it never should disappear. On page 75 she says: ‘“‘the modern 
employee (particularly workers in factories) have distorted 








380 The Journal of Educational Psychology 


viewpoints, because they are stirred up (verhetzt) by propaganda. 
Since youth they have been incited to inimical attitudes against 
employers by the newspapers of their parties, by their associa- 
tions, and so on. Whatever entered their lives as something 
superior was interpreted by them since childhood as narrowing 
them down. Particularly they are inoculated with the idea 
that they are objects of exploitation. Therefore, they always 
feel themselves sacrificed to social injustice.” 

In regard to her positive or negative recommendations to 
employers and employees the author writes pretty much on the 
level of an Emily Post or a Dale Carnegie. An example, selected 
at random, may be found on page 122, where the following 
truism is put in italics for emphasis: ‘‘Only such orders should 
be given to the employee which have been first well thought 
through and deliberated upon by the employer.” 

The author does not like the idea that in order to safeguard an 
employee’s self-respect when criticizing him one should start 
out with some positive statement about things which he had 
done well. She says (p. 167) ‘This is undesirable, because the 
culprit (szc) is in this way taught to overlook an unpleasant 
situation. In this way his moral power of resistance is decreased. 
He is spoiled instead of hardened for the fight for survival.” 

Neither does she like the suggestion that the worker should 
be encouraged to make suggestions for improvements in the 
plant. She approvingly quotes E. Duebi (p. 203) who says 
that one should not be deceived about the results of such efforts. 
‘“‘Encouragement of employees to make suggestions may easily 
lead to a ‘know-better’ attitude. The employees then may 
interfere with other matters, express opinions about nearly 
everything, even about matters which do not pertain to their 
immediate tasks and in this way annoy their superiors. Such 
experiences, which are not rare, force many superiors to let the 
employees feel the ‘line,’ meaning the social distance, between 
them and their employers, which has always to be maintained.” 

In her concluding remarks, the author states most succinctly 
her own viewpoints. There (p. 282) she says that ‘the very 
simple advice that a friendly word is better than criticism is 
difficult to apply. The hierarchical construction of the plant 
facilitates the development of a feeling of superiority, which all 
too frequently creates an empty space between employer and 











Book Reviews 381 


employee and prevents immediate contact. On the other hand, 
the low intellectual level and emotional attitudes of employees 
which are due to their being incited by vicious propaganda fre- 
quently prevents them from appreciating the good intentions 
with which they are approached.”’ 

This may be sufficient to describe an industrial psychology 
which basically caters to the employers and disparages the 
employees; which believes in a hierarchical organization of the 
plant and, despite contrary assertions, uses the frame of reference 
of trait psychology exclusively. It is a psychology devoted to 
the manipulation of employees in the sole interest of employers; 
it is definitely not a psychology of industrial relations. If 
another proof would be necessary, the author’s bibliography 
provides it. It contains two hundred seventy-nine bibliograph- 
ical items which do not include the names of Elton Mayo, Roeth- 
lisberger and Dickson, or Whitehead. 

The most regrettable phenomenon is that this book was written 
and printed in Switzerland, one of the world’s oldest democracies. 
This has important implications for the most pressing problem 
of our times—the democratic education of Europe. It shows 
that seemingly true political democracy may exist in a relatively 
unindustrialized country despite undemocratic factory organiza- 
tion. The same is not true for highly industrialized countries, 
such as ours, and as Germany was. There the working of 
political democracy depends on the existence of a working 
democracy in the plants, on democratic human relations in 
industry. Not much can be hoped for Europe unless the author’s 
suggestions for manipulation of the workers is replaced by truly 
humane relations in industry. BruNo BETTELHEIM 

The Department of Education 

The University of Chicago 








382 The Journal of Educational Psychology 


ARTHUR T. JERSILD AND AssociATEs. Child Development and the 
Curriculum. New York: Bureau of Publications, Teachers 
College, Columbia University, 1946, p. 274. $2.75. 


The curriculum of the elementary and secondary schools is 
the essential core of the total educational program. What shall 
be the nature of the curriculum has been the subject of consider- 
able discussion since the beginning of systematic interesi in 
education as a social process. Since 1943 the staff of the Horace 
Mann-Lincoln Institute of School Experimentation has been 
engaged in a major research project on the curriculum. The 
point of view basic to this research is that the curriculum must 
be adjusted to the child and to the needs of our society. The 
present book is concerned with the former of these. 

Jersild and the members of his committee—Mary E. Chayer, 
Charlotte Fehlman, Gertrude Hildreth, and Marian Young— 
have undertaken to appraise, with critical insight, the literature 
of research in child development as it may contribute to, or 
affect, the formulation of the school curriculum. It is their 
thesis that curriculum-making should “‘take account of findings 
with respect to what children are like at any given level, and 
knowledge concerning what is needed for the present and what 
will be valuable for the future.” The first two chapters are 
devoted to a discussion of the relations between child develop- 
ment and the curriculum. In the following four chapters the 
status of our present knowledge concerning infancy, the preschool 
years, the elementary-school ages, and adolescence are skillfully 
summarized with a bias toward the place of such knowledge in 
educational practice. An important aspect of their appraisal 
is the pointing out of areas in the field of child development where 
research is lacking and where unsupported opinion is accepted 
as truth. 

The presentation of the summary is intentionally made in a 
lucid style in order to introduce teachers (or parents for that 
matter) to this highly important field of research. The average 
reader is not diverted with the encumbrances of detailed research 
reports, nor do recurrent bibliographic notes intrude themselves. 
However, evidence of scholarship is abundantly given in twenty- 
eight pages of bibliography and notes arranged according to the 
sequence of the text. This is a valuable addition to educational 











Book Reviews 383 


literature which is of value not alone to the classroom teacher, 
but to the research specialist as well. C. M. Louttit 
Geneva, New York 


THEODOR RerKx. Psychology of Sex Relations. New York: 
Farrar and Rinehart, Inc., 1945, pp. 243. 


There is more to love than sex. Love, ego gains, and sex 
gratification are all involved in what makes John and Jane love. 
This is the main theme of Reik’s volume about sex, which is 
dedicated to the author’s son, starts off with a preface describing 
John and Jane—lovers, and ends with a postiude saying farewell 
to John and Jane and young couples like them with the final note 
that in the performance of the symphony of life the sex-drive 
plays among the first violins, but the concert-master is the ego. 
The content is presented in three parts: ‘‘The Nature of the Sex- 
drive,’ ‘‘Love and the Ego-drives,’”’ ‘Love and Lust.” 

The chief target is Freud, but so is he also the chief source of 
information as well as the chief point of departure. The criticism 
of Freud includes such items as an over-emphasis on the incest 
taboo as an explanation of impotence and frigidity, an over- 
emphasis on the Oedipus situation as the template which molds 
most of the vicissitudes of the sexual life of adults, and of the 
Freudian notion of polymorph-perverse. [Illustrative is the 
author’s reaction to the Oedipus situation, where he says, 
‘“Toilet-training . . . paved the way for the difficulties of the 
Oedipus situation which the psychoanalysists consider to be 
the only source of all vicissitudes of sex. They forget that the 
vital functions of secretion molded the pattern for sex-behavior 
long before the Oedipus situation descended on the child. To 
trace all difficulties back to this one origin is arbitrary, less a 
scientific statement than an insult to our intelligence.” Strong 
words for a relatively very small difference in emphasis, a differ- 
ence in emphasis which Reik is not the only one in making. He 
ends this criticism of items and special phases, not the general 
doctrine of Freud, with a quotation from Spencer, saying that 
Freud is a genius whose libido system will have the sad destiny 
which Spencer once bemoaned in speaking of “a beautiful 
theory that was murdered by a gang of brutal facts.” The 
murder is being helped by Reik in the name of neo-psycho- 
analysis, which conceives of psychological research into the 








384 The Journal of Educational Psychology 


perversions as a study of violence and degradation, of fear and 
defiance, rather than of sex. Homosexuality is similarly treated. 

On the positive side is his emphasis on man’s self-respect. In 
his little chapter on ‘‘The Essence of Romance” he makes the 
generalization that “‘A man who does not accept himself and 
does not regain enough self-respect will not be able to love. 
Who has not courage enough and cannot get enough self-con- 
fidence will not win another’s affection. Only the brave deserve 
the fair.” 

The captions used in considering this attitude suggest both the 
style and content. The chapter topics are: ‘“‘The Desire to 
be Desired,” “‘ Response,” ‘‘ Meeting and Melting.” 

In spite of his over-emphasis on his criticism of Freud and 
his influence by Freud of overdoing the significance of neo- 
psychoanalysis, the book is well enough written so that it carries 
the general reader. The reviewer has given this volume to 
individuals with sex difficulties and most of them have liked it. 
For one thing, unlike other books on sex, people who are con- 
cerned with the psychology of sex relations do not have to delve 
through chapters on the anatomy of the male and of the female 
and other irrelevant physiological facts which are of little interest 
to them. This volume, at least, sticks to the psychological 
level of description and does talk in a manner that can be under- 
stood by people with sex and love difficulties. However, it is 
far from being a very comprehensive consideration of sex and 
marriage in terms of possible temperamental difficulties and 
motivation behind it. But it can serve usefully to people who 
want to know more about sex relations, their possible significance 
from the point of view of ego development, and it can be at 
least of help to people who are searching for such knowledge. 
It is a small book, well written, somewhat dissociated, and is 
characterized by continuity in the general themes considered. 
The style of writing is assertive but not unpleasant, and there 
is an assurance about the conclusions even when they are not 
justified, which is sometimes of help to people who are indecisive, 
but can be disturbing to others. H. MELTZER 

Psychological Service Center, St. Louis, Missouri 











